Bug 1465464

Summary:

Kibana container grows in memory till out of memory

Product:

OpenShift Container Platform

Reporter:

Steven Walter <stwalter>

Component:

Logging

Assignee:

Jeff Cantrill <jcantril>

Status:

CLOSED WONTFIX

QA Contact:

Junqi Zhao <juzhao>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

3.4.1

CC:

aos-bugs, erich, hhorak, jcantril, juzhao, jwozniak, pportant, stwalter, wsun

Target Milestone:

---

Keywords:

Reopened

Target Release:

3.4.z

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Cause: Consequence: Fix: Use underscores instead of dashes for the memory switch Result: Memory settings are respected by the nodejs runtime

Story Points:

---

Clone Of:

Clones:

1469711 (view as bug list)

Environment:

Last Closed:

2017-09-04 02:15:40 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1474689

Bug Blocks:

1469711

Attachments:

Description	Flags
kibana pod info	none
OOMKilled for container kibana-proxy	none
OOMKilled for container kibana-proxy -20170818	none
changed resources.limits.memory to 150Mi, no OOM error now	none
resources.limits.memory is still 96Mi for kibana-proxy container	none

Description Steven Walter 2017-06-27 13:33:41 UTC

Description of problem:
Kibana container grows in memory till OOMkill. We tested giving the kibana proxy and the kibana container more memory; the kibana proxy no longer OOM kills once given 300 Mi but the kibana container grows to 6Gi after about 24 hours.

Version-Release number of selected component (if applicable):
openshift v3.4.1.12
kubernetes v1.4.0+776c994
registry.access.redhat.com/openshift3/logging-kibana:v3.4

How reproducible:
Unconfirmed

Uploading DC and metrics

Comment 3 Jeff Cantrill 2017-06-27 13:47:44 UTC


*** This bug has been marked as a duplicate of bug 1464020 ***

Comment 4 Steven Walter 2017-06-27 14:36:53 UTC

As per conversation, re-opened as a separate issue from bug 1464020 -- that bug is specifically for the kibana proxy container and this is for the kibana container

Comment 5 Jeff Cantrill 2017-06-28 15:49:54 UTC

I believe this has been fixed upstream and needs to be backported.  Looking at 3.4 release, I don't see where the proxy is propagating the memory request to the nodejs run time.  Marking as upcoming release to remove from the 3.6 blocker list

Comment 6 Jeff Cantrill 2017-06-28 15:54:28 UTC

Strike comment #5 as i pasted into wrong issue.  @Steven, can you provide the version of the image you are using for kibana?  Something like 3.4.1-XX.  That would give me a better idea what specifically was tested other then the sha

Comment 7 Jeff Cantrill 2017-06-28 18:21:28 UTC

Found a few posts that seem to indicate we need to configure --max_old_space_size.  

[1] https://github.com/nodejs/node/issues/7937
[2] https://github.com/elastic/kibana/issues/9006

Comment 8 Jeff Cantrill 2017-06-28 18:22:57 UTC

Note the flag per searching the node docs should have underscores and not dashes

Comment 11 Steven Walter 2017-06-29 15:30:45 UTC

Where do we specify --max_old_space_size ?

Comment 17 Junqi Zhao 2017-07-27 05:14:30 UTC

Deployed logging 3.4.1 and let it run for a few hours, during this period, created a few projects to populate logs, described kibana pod, OOMKilled for kibana and kibana-proxy container. 

#  oc describe po ${kibana-pods}
Containers:
  kibana:
  ..........
    Port:		
    Limits:
      memory:	736Mi
    Requests:
      memory:		736Mi
    State:		Running
      Started:		Wed, 26 Jul 2017 18:06:26 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Wed, 26 Jul 2017 12:43:40 -0400
      Finished:		Wed, 26 Jul 2017 18:06:24 -0400

  kibana-proxy:
    ...............
    Port:		3000/TCP
    Limits:
      memory:	96Mi
    Requests:
      memory:		96Mi
    State:		Running
      Started:		Wed, 26 Jul 2017 20:15:41 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Wed, 26 Jul 2017 12:43:44 -0400
      Finished:		Wed, 26 Jul 2017 20:15:39 -0400

Comment 18 Junqi Zhao 2017-07-27 05:15:09 UTC

Created attachment 1305197 [details]
kibana pod info

Comment 22 Junqi Zhao 2017-08-08 11:59:22 UTC

Created a few projects to populate logs and run twice with 

$ for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana_route}/elasticsearch/ -sk > /dev/null;  done; sleep 120s; for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana-ops_route}/elasticsearch/ -sk > /dev/null;  done

kibana pod restarted 3 times, kibana-ops pod restarted 2 times. Although kibana container did not terminated with OOMKilled error, kibana-proxy terminated with OOMKilled error

# oc get po
NAME                              READY     STATUS      RESTARTS   AGE
java-mainclass-1-56prz            1/1       Running     0          44m
logging-curator-1-gw3l4           1/1       Running     0          3h
logging-curator-ops-1-0neb5       1/1       Running     0          3h
logging-deployer-zjpl7            0/1       Completed   0          3h
logging-es-axorjnf6-1-ivr7c       1/1       Running     0          3h
logging-es-ops-plqwi2jt-1-5atpn   1/1       Running     0          3h
logging-fluentd-s318q             1/1       Running     0          3h
logging-kibana-1-4aqqx            2/2       Running     3          3h
logging-kibana-ops-1-d39aj        2/2       Running     2          3h

#  oc describe po ${kibana-pods}, more info see the attached file
Containers:
  kibana:
    .......................		
    Limits:
      memory:	736Mi
    Requests:
      memory:		736Mi
    State:		Running
      Started:		Tue, 08 Aug 2017 04:38:32 -0400
    Ready:		True
    Restart Count:	0
    .......................	
  kibana-proxy:
    .......................	
    Limits:
      memory:	96Mi
    Requests:
      memory:		96Mi
    State:		Running
      Started:		Tue, 08 Aug 2017 07:45:02 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Tue, 08 Aug 2017 07:14:19 -0400
      Finished:		Tue, 08 Aug 2017 07:44:59 -0400
    Ready:		True
    Restart Count:	3

Comment 23 Junqi Zhao 2017-08-08 12:02:26 UTC

Created attachment 1310599 [details]
OOMKilled for container kibana-proxy

Comment 25 Junqi Zhao 2017-08-18 02:52:24 UTC

Verified with logging-kibana:3.4.1-28, same steps with Comment 22, kibana-proxy container still terminated with OOMKilled error, more info see the attached file

# oc get po
NAME                              READY     STATUS      RESTARTS   AGE
logging-curator-1-2gij7           1/1       Running     0          1h
logging-curator-ops-1-yeau9       1/1       Running     0          1h
logging-deployer-nj623            0/1       Completed   0          1h
logging-es-7dw1dcne-1-86w7j       1/1       Running     0          1h
logging-es-ops-bv2i5e2v-1-8fu68   1/1       Running     0          1h
logging-fluentd-ydj7a             1/1       Running     0          1h
logging-kibana-1-i0gdv            2/2       Running     2          1h
logging-kibana-ops-1-68105        2/2       Running     2          1h

Comment 26 Junqi Zhao 2017-08-18 02:53:10 UTC

Created attachment 1314999 [details]
OOMKilled for container kibana-proxy -20170818

Comment 27 Peter Portante 2017-08-18 03:41:32 UTC

In order to the debug this, we need to be sure we add up all the "limits" for pods running on the node, the total size of memory on the node, gather the node reserve size in /etc/origin/node/node-config.yml, and see if everything "fits".

Second, we need to make sure the "requests" size is the same as the "limits" explicitly set in all the DCs and DSs.

Comment 28 Junqi Zhao 2017-08-18 09:29:39 UTC

Changed resources.limits.memory from default value 96Mi to 150Mi for kibana-proxy, not found OOM error for both kibana and kibana-proxy containers.

See the attached file.

I think it makes sense, the OOM would throw out if the resources.limits.memory is two small.
Steps:
Created a few projects to populate logs and run four times with 

$ for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana_route}/elasticsearch/ -sk > /dev/null;  done; sleep 120s; for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana-ops_route}/elasticsearch/ -sk > /dev/null;  done

Comment 29 Junqi Zhao 2017-08-18 09:31:16 UTC

Created attachment 1315146 [details]
changed resources.limits.memory to 150Mi, no OOM error now

Comment 30 Junqi Zhao 2017-08-21 05:23:01 UTC

no OOM error shows now.

Verification steps:
Verified with logging-kibana:3.4.1-28, and changed resources.limits.memory from default value 96Mi to a larger value, created a few projects to populate logs and run four times with 

$ for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana_route}/elasticsearch/ -sk > /dev/null;  done; sleep 120s; for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana-ops_route}/elasticsearch/ -sk > /dev/null;  done

Comment 33 Jan Wozniak 2017-08-23 14:55:07 UTC

When we adjusted the calculations of max-old-space-size to half [1] of what pod receives, another PR to openshift-ansible [2] got lost in the process. When it merges, the default memory should be set to conservative 256MB which should suffice.


[1] https://github.com/openshift/origin-aggregated-logging/pull/529
[2] https://github.com/openshift/openshift-ansible/pull/4761

Comment 34 Peter Portante 2017-08-24 02:17:58 UTC

Don't we also need a fix to half max-old-space-size for the kibana-proxy as well?

Comment 37 Junqi Zhao 2017-08-30 05:41:42 UTC

Created attachment 1319837 [details]
resources.limits.memory is still 96Mi for kibana-proxy container

Comment 40 Jeff Cantrill 2017-09-01 13:28:49 UTC

Wei,

We should encourage users of 3.4 to migrate to 3.5.  The work around for this issue is to deploy logging and then edit the logging-kibana DeploymentConfig:

1. Set env. variable OCP_AUTH_PROXY_MEMORY_LIMIT to 256Mi for the kibana-proxy container

Comment 41 Junqi Zhao 2017-09-04 02:15:40 UTC

Workaround in Comment 40 works, there is no OOM error for kibana-proxy container.
Since we are not going to fix it for logging 3.4, and there is not documentations to encourage user migrate to 3.5, I think we should close it as WONTFIX and remove this defect from errata.

Verification step:
Creat a few projects to populate logs and run with 

$ for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana_route}/elasticsearch/ -sk > /dev/null;  done; sleep 120s; for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana-ops_route}/elasticsearch/ -sk > /dev/null;  done

Comment 42 Junqi Zhao 2017-09-04 05:47:49 UTC

@Jeff,

I have reported one documentation defect for this issue, it tells the workaround if customers come across this issue.

https://bugzilla.redhat.com/show_bug.cgi?id=1488001

We closed this defect as WONTFIX, are you OK with this solution?

Comment 43 Jeff Cantrill 2017-09-05 13:27:05 UTC

Yes.