Bug 1359375 - [Scale Testing] nsenter: Unable to fork: Cannot allocate memory
Summary: [Scale Testing] nsenter: Unable to fork: Cannot allocate memory
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhgs-server-container
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: CNS 3.4
Assignee: Mohamed Ashiq
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks: 1385246
TreeView+ depends on / blocked
 
Reported: 2016-07-23 11:41 UTC by Prasanth
Modified: 2017-01-18 14:59 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-18 14:59:18 UTC
Embargoed:


Attachments (Terms of Use)
sosreport-master (7.08 MB, application/x-xz)
2016-07-23 11:53 UTC, Prasanth
no flags Details
sosreport-from-rebooted-node (17.34 MB, application/x-xz)
2016-07-23 12:03 UTC, Prasanth
no flags Details
sosreport1-from-problematic_node_during-issue (14.13 MB, application/x-xz)
2016-07-25 13:41 UTC, Prasanth
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1398235 0 urgent CLOSED Local mounts from openshift nodes gets unmounted while deploying glusterfs container 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHEA-2017:0149 0 normal SHIPPED_LIVE rhgs-server-docker bug fix and enhancement update 2017-01-18 20:08:41 UTC

Internal Links: 1398235

Description Prasanth 2016-07-23 11:41:31 UTC
Description of problem:


I'm seeing an issue in my Aplo Scale setup having 100 volumes on rebooting one of the OpenShift node that hosts the Red Hat Gluster Storage container. After the node comes back, the gluster pod fails to be in 'Running' status and it keeps restarting the RHGS container.

###########
# oc get pods
NAME                                                     READY     STATUS    RESTARTS   AGE
aplo-router-1-lyxog                                      1/1       Running   0          2d
glusterfs-dc-dhcp41-198.lab.eng.blr.redhat.com-1-3neud   1/1       Running   1          5h
glusterfs-dc-dhcp41-200.lab.eng.blr.redhat.com-1-biuvi   0/1       Running   9          1h
glusterfs-dc-dhcp41-202.lab.eng.blr.redhat.com-1-4fmtg   1/1       Running   0          5h
heketi-1-ptp43
###########

The following error message is what I'm seeing in oc events:

######
21m       17m       3         glusterfs-dc-dhcp41-200.lab.eng.blr.redhat.com-1-biuvi   Pod       spec.containers{glusterfs}   Warning   Unhealthy   {kubelet dhcp41-200.lab.eng.blr.redhat.com}   Readiness probe failed: nsenter: Unable to fork: Cannot allocate memory

21m       21m       1         glusterfs-dc-dhcp41-200.lab.eng.blr.redhat.com-1-biuvi   Pod       spec.containers{glusterfs}   Normal    Killing     {kubelet dhcp41-200.lab.eng.blr.redhat.com}   Killing container with docker id 76e749a899e4: pod "glusterfs-dc-dhcp41-200.lab.eng.blr.redhat.com-1-biuvi_aplo(85f029a6-50b7-11e6-bf80-525400d359a6)" container "glusterfs" is unhealthy, it will be killed and re-created.
21m       21m       1         glusterfs-dc-dhcp41-200.lab.eng.blr.redhat.com-1-biuvi   Pod       spec.containers{glusterfs}   Normal    Created     {kubelet dhcp41-200.lab.eng.blr.redhat.com}   Created container with docker id 73ba92838243
#########

I've seen this issue in 2 different scale setups having 100 gluster volumes. However, i didn't encounter this issue in a 50 volume scale setup
Version-Release number of selected component (if applicable):


How reproducible: Always


Steps to Reproduce:
1. Create 100 volumes using heketi-cli
2. Create PV's, PVC's and apps for the same
3. Reboot one of the OpenShift node that hosts the Red Hat Gluster Storage container

Actual results: Gluster pod is NOT coming back and it keeps restarting

Expected results: Gluster pod should come back after a node reboot without any issues


Additional info: I'll be updating this bug with the sosreports from the node and the master.

Comment 2 Prasanth 2016-07-23 11:53:42 UTC
Created attachment 1183078 [details]
sosreport-master

Comment 3 Prasanth 2016-07-23 12:03:15 UTC
Created attachment 1183091 [details]
sosreport-from-rebooted-node

Comment 6 Humble Chirammal 2016-07-25 09:06:42 UTC
(In reply to Prasanth from comment #3)
> Created attachment 1183091 [details]
> sosreport-from-rebooted-node

I went through the attached sosreport and it seems to me that  the sosreport of the problematic node is captured later or when issue is not present. Can you please capture a sosreport from the problematic node when we hit the issue and attach ?

Comment 7 Prasanth 2016-07-25 13:41:58 UTC
Created attachment 1183810 [details]
sosreport1-from-problematic_node_during-issue

Comment 8 Prasanth 2016-07-25 13:50:30 UTC
(In reply to Humble Chirammal from comment #6)
> (In reply to Prasanth from comment #3)
> > Created attachment 1183091 [details]
> > sosreport-from-rebooted-node
> 
> I went through the attached sosreport and it seems to me that  the sosreport
> of the problematic node is captured later or when issue is not present. Can
> you please capture a sosreport from the problematic node when we hit the
> issue and attach ?

As requested, i've captured the sosreport from the problematic node while the issue was seen and attached to this BZ.

Comment 19 krishnaram Karthick 2016-11-08 16:55:16 UTC
I'm seeing this bug with CNS 3.4 setup with the following build. This is seen after node reboot.

openshift version
openshift v3.4.0.23+24b1a58
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

snippet of oc describe pod
==============================
Events:
  FirstSeen	LastSeen	Count	From						SubobjectPath			Type		Reason		Message
  ---------	--------	-----	----						-------------			--------	------		-------
  4m		4m		1	{default-scheduler }								Normal		Scheduled	Successfully assigned glusterfs-dc-dhcp46-226.lab.eng.blr.redhat.com-1-maz6h to dhcp46-226.lab.eng.blr.redhat.com
  3m		3m		1	{kubelet dhcp46-226.lab.eng.blr.redhat.com}	spec.containers{glusterfs}	Normal		Created		Created container with docker id 0fd2d0e12f0b; Security:[seccomp=unconfined]
  3m		3m		1	{kubelet dhcp46-226.lab.eng.blr.redhat.com}	spec.containers{glusterfs}	Normal		Started		Started container with docker id 0fd2d0e12f0b
  1m		1m		2	{kubelet dhcp46-226.lab.eng.blr.redhat.com}	spec.containers{glusterfs}	Warning		Unhealthy	Readiness probe failed: nsenter: Unable to fork: Cannot allocate memory

  1m	1m	1	{kubelet dhcp46-226.lab.eng.blr.redhat.com}	spec.containers{glusterfs}	Normal	Killing		Killing container with docker id 0fd2d0e12f0b: pod "glusterfs-dc-dhcp46-226.lab.eng.blr.redhat.com-1-maz6h_storage-project(d477e985-a5d2-11e6-97a3-005056b3a033)" container "glusterfs" is unhealthy, it will be killed and re-created.
  4m	1m	2	{kubelet dhcp46-226.lab.eng.blr.redhat.com}	spec.containers{glusterfs}	Normal	Pulling		pulling image "rhgs3/rhgs-server-rhel7"
  3m	1m	2	{kubelet dhcp46-226.lab.eng.blr.redhat.com}	spec.containers{glusterfs}	Normal	Pulled		Successfully pulled image "rhgs3/rhgs-server-rhel7"
  1m	1m	1	{kubelet dhcp46-226.lab.eng.blr.redhat.com}	spec.containers{glusterfs}	Normal	Created		Created container with docker id c51ea1e401b6; Security:[seccomp=unconfined]
  1m	1m	1	{kubelet dhcp46-226.lab.eng.blr.redhat.com}	spec.containers{glusterfs}	Normal	Started		Started container with docker id c51ea1e401b6
  2m	18s	4	{kubelet dhcp46-226.lab.eng.blr.redhat.com}	spec.containers{glusterfs}	Warning	Unhealthy	Liveness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: inactive (dead)

  2m	8s	8	{kubelet dhcp46-226.lab.eng.blr.redhat.com}	spec.containers{glusterfs}	Warning	Unhealthy	Readiness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: inactive (dead)


ps --sort -rss -eo rss,pid,command | head  RSS    PID COMMAND
152748  1447 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor
110612   738 /usr/lib/systemd/systemd-journald
99420   6392 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2
90352    980 /usr/sbin/dmeventd -f
77332 112183 /usr/bin/docker-current daemon --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --selinux-enabled --log-driver=json-file --log-opt max-size=50m --add-registry registry.ops.openshift.com --add-registry brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888 --add-registry registry.access.redhat.com --insecure-registry registry.ops.openshift.com --insecure-registry brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888
59668   3031 /usr/sbin/rsyslogd -n
50572 112736 /usr/bin/openshift-router
47976 113727 /usr/bin/openshift-deploy
37196   6901 journalctl -k -f

sosreports shall be attached shortly.

p.s., Please note that the issue was seen even after adding swap memory of 16GB in addition to existing 32GB RAM. However, sosreport was collected before swap memory was added.

Comment 22 Michael Adam 2016-11-11 11:54:43 UTC
The system setup does not seem sufficient.
The system should reserve 32 GB of RAM for the gluster container.
And have more memory for the system and containers, etc.

Here is the guide , sec. 3.2.5.:

https://access.redhat.com/documentation/en/red-hat-gluster-storage/3.1/single/container-native-storage-for-openshift-container-platform/

Please retest with such a system.

Comment 23 krishnaram Karthick 2016-11-15 06:09:26 UTC
(In reply to Michael Adam from comment #22)
> The system setup does not seem sufficient.
> The system should reserve 32 GB of RAM for the gluster container.
> And have more memory for the system and containers, etc.
> 
> Here is the guide , sec. 3.2.5.:
> 
> https://access.redhat.com/documentation/en/red-hat-gluster-storage/3.1/
> single/container-native-storage-for-openshift-container-platform/

Michael,

I think we should be clear in documenting the memory requirements here. We say each "physical node" hosting RHGS peer will need a minimum of 32GB RAM. From comment#22 I understand that we would need 32GB RAM for "gluster container" alone. This means the "physical node" should have "32GB RAM (for gluster container) + additional memory (for other resources)" for CNS solution to run. I think there is a gap in what we are advertising. Do we have a recommendation of what "additional memory" should be? at least considering physical node only host CNS and not any apps - memory consumption of app is not in our purview, but overall memory consumption of CNS is in our purview. So at least we should have a recommended memory per node for CNS to work without any memory issues. Allocating further memory for apps to run along with CNS can be left to the user.

contents of sec 3.2.5
=====================
     Ensure that the Trusted Storage Pool is not scaled beyond 100 volumes per 3 nodes per 32G of RAM.
    A trusted storage pool consists of a minimum of 3 nodes/peers.
    Distributed-Three-way replication is the only supported volume type.
    Each physical node that needs to host a Red Hat Gluster Storage peer:
        will need a minimum of 32GB RAM.
        is expected to have the same disk type.
        by default the heketidb utilises 32 GB distributed replica volume. 
    Red Hat Gluster Storage Container Native with OpenShift Container Platform supports up to 14 snapshots per volume. 

> 
> Please retest with such a system.

QE will be able to retest once we have a recommendation from DEV on memory requirements for CNS.

Comment 24 Michael Adam 2016-11-15 12:29:59 UTC
(In reply to krishnaram Karthick from comment #23)
> (In reply to Michael Adam from comment #22)
> > The system setup does not seem sufficient.
> > The system should reserve 32 GB of RAM for the gluster container.
> > And have more memory for the system and containers, etc.
> > 
> > Here is the guide , sec. 3.2.5.:
> > 
> > https://access.redhat.com/documentation/en/red-hat-gluster-storage/3.1/
> > single/container-native-storage-for-openshift-container-platform/
> 
> Michael,
> 
> I think we should be clear in documenting the memory requirements here. We
> say each "physical node" hosting RHGS peer will need a minimum of 32GB RAM.
> From comment#22 I understand that we would need 32GB RAM for "gluster
> container" alone. This means the "physical node" should have "32GB RAM (for
> gluster container) + additional memory (for other resources)" for CNS
> solution to run.

Correct.

> I think there is a gap in what we are advertising. Do we
> have a recommendation of what "additional memory" should be? at least
> considering physical node only host CNS and not any apps - memory
> consumption of app is not in our purview, but overall memory consumption of
> CNS is in our purview. So at least we should have a recommended memory per
> node for CNS to work without any memory issues.

Right, so the assumption is to have at least something like 48 GB RAM
on the node, of which 32 are reserved for the gluster container.
The actual system and openshift needs memory to run, not counting the
actual apps.

Note that mounting the volumes into containers, we would need additional
memory as well, since the gluster fuse client consumes some 200-500 MB
per mount. Hence, in order to mount 50 volumes into containers, one would
need to have an additional 25 GB of free RAM on the host...

> Allocating further memory
> for apps to run along with CNS can be left to the user.
> 
> contents of sec 3.2.5
> =====================
>      Ensure that the Trusted Storage Pool is not scaled beyond 100 volumes
> per 3 nodes per 32G of RAM.
>     A trusted storage pool consists of a minimum of 3 nodes/peers.
>     Distributed-Three-way replication is the only supported volume type.
>     Each physical node that needs to host a Red Hat Gluster Storage peer:
>         will need a minimum of 32GB RAM.
>         is expected to have the same disk type.
>         by default the heketidb utilises 32 GB distributed replica volume. 
>     Red Hat Gluster Storage Container Native with OpenShift Container
> Platform supports up to 14 snapshots per volume. 
> 
> > 
> > Please retest with such a system.
> 
> QE will be able to retest once we have a recommendation from DEV on memory
> requirements for CNS.

Right, I am going to prepare a detailed mail with the analysis
of memory consumption on client and server. Hence come up with new
findings. (Note the result may be that we would even need more than
the 32GB of RAM on the gluster container...)

But for now, please retest with a system of 48GB of RAM, of which
(at least) 32GB are reserved for the gluster container.
(This is assuming that you have *one* gluster container per node and
that you are using the standard replica-3 volumes).

Thanks -- Michael

Comment 26 krishnaram Karthick 2016-12-28 04:34:38 UTC
The issue reported in this bug is no more seen after increasing the memory of the test machines to 48GB.

Rebooting of pods,nodes spawned gluster pods without any issues.

Moving the bug to verified.

Comment 28 errata-xmlrpc 2017-01-18 14:59:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:0149


Note You need to log in before you can comment on or make changes to this bug.