Description of problem: I'm seeing an issue in my Aplo Scale setup having 100 volumes on rebooting one of the OpenShift node that hosts the Red Hat Gluster Storage container. After the node comes back, the gluster pod fails to be in 'Running' status and it keeps restarting the RHGS container. ########### # oc get pods NAME READY STATUS RESTARTS AGE aplo-router-1-lyxog 1/1 Running 0 2d glusterfs-dc-dhcp41-198.lab.eng.blr.redhat.com-1-3neud 1/1 Running 1 5h glusterfs-dc-dhcp41-200.lab.eng.blr.redhat.com-1-biuvi 0/1 Running 9 1h glusterfs-dc-dhcp41-202.lab.eng.blr.redhat.com-1-4fmtg 1/1 Running 0 5h heketi-1-ptp43 ########### The following error message is what I'm seeing in oc events: ###### 21m 17m 3 glusterfs-dc-dhcp41-200.lab.eng.blr.redhat.com-1-biuvi Pod spec.containers{glusterfs} Warning Unhealthy {kubelet dhcp41-200.lab.eng.blr.redhat.com} Readiness probe failed: nsenter: Unable to fork: Cannot allocate memory 21m 21m 1 glusterfs-dc-dhcp41-200.lab.eng.blr.redhat.com-1-biuvi Pod spec.containers{glusterfs} Normal Killing {kubelet dhcp41-200.lab.eng.blr.redhat.com} Killing container with docker id 76e749a899e4: pod "glusterfs-dc-dhcp41-200.lab.eng.blr.redhat.com-1-biuvi_aplo(85f029a6-50b7-11e6-bf80-525400d359a6)" container "glusterfs" is unhealthy, it will be killed and re-created. 21m 21m 1 glusterfs-dc-dhcp41-200.lab.eng.blr.redhat.com-1-biuvi Pod spec.containers{glusterfs} Normal Created {kubelet dhcp41-200.lab.eng.blr.redhat.com} Created container with docker id 73ba92838243 ######### I've seen this issue in 2 different scale setups having 100 gluster volumes. However, i didn't encounter this issue in a 50 volume scale setup Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Create 100 volumes using heketi-cli 2. Create PV's, PVC's and apps for the same 3. Reboot one of the OpenShift node that hosts the Red Hat Gluster Storage container Actual results: Gluster pod is NOT coming back and it keeps restarting Expected results: Gluster pod should come back after a node reboot without any issues Additional info: I'll be updating this bug with the sosreports from the node and the master.
Created attachment 1183078 [details] sosreport-master
Created attachment 1183091 [details] sosreport-from-rebooted-node
(In reply to Prasanth from comment #3) > Created attachment 1183091 [details] > sosreport-from-rebooted-node I went through the attached sosreport and it seems to me that the sosreport of the problematic node is captured later or when issue is not present. Can you please capture a sosreport from the problematic node when we hit the issue and attach ?
Created attachment 1183810 [details] sosreport1-from-problematic_node_during-issue
(In reply to Humble Chirammal from comment #6) > (In reply to Prasanth from comment #3) > > Created attachment 1183091 [details] > > sosreport-from-rebooted-node > > I went through the attached sosreport and it seems to me that the sosreport > of the problematic node is captured later or when issue is not present. Can > you please capture a sosreport from the problematic node when we hit the > issue and attach ? As requested, i've captured the sosreport from the problematic node while the issue was seen and attached to this BZ.
I'm seeing this bug with CNS 3.4 setup with the following build. This is seen after node reboot. openshift version openshift v3.4.0.23+24b1a58 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 snippet of oc describe pod ============================== Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 4m 4m 1 {default-scheduler } Normal Scheduled Successfully assigned glusterfs-dc-dhcp46-226.lab.eng.blr.redhat.com-1-maz6h to dhcp46-226.lab.eng.blr.redhat.com 3m 3m 1 {kubelet dhcp46-226.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Created Created container with docker id 0fd2d0e12f0b; Security:[seccomp=unconfined] 3m 3m 1 {kubelet dhcp46-226.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Started Started container with docker id 0fd2d0e12f0b 1m 1m 2 {kubelet dhcp46-226.lab.eng.blr.redhat.com} spec.containers{glusterfs} Warning Unhealthy Readiness probe failed: nsenter: Unable to fork: Cannot allocate memory 1m 1m 1 {kubelet dhcp46-226.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Killing Killing container with docker id 0fd2d0e12f0b: pod "glusterfs-dc-dhcp46-226.lab.eng.blr.redhat.com-1-maz6h_storage-project(d477e985-a5d2-11e6-97a3-005056b3a033)" container "glusterfs" is unhealthy, it will be killed and re-created. 4m 1m 2 {kubelet dhcp46-226.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Pulling pulling image "rhgs3/rhgs-server-rhel7" 3m 1m 2 {kubelet dhcp46-226.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Pulled Successfully pulled image "rhgs3/rhgs-server-rhel7" 1m 1m 1 {kubelet dhcp46-226.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Created Created container with docker id c51ea1e401b6; Security:[seccomp=unconfined] 1m 1m 1 {kubelet dhcp46-226.lab.eng.blr.redhat.com} spec.containers{glusterfs} Normal Started Started container with docker id c51ea1e401b6 2m 18s 4 {kubelet dhcp46-226.lab.eng.blr.redhat.com} spec.containers{glusterfs} Warning Unhealthy Liveness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: inactive (dead) 2m 8s 8 {kubelet dhcp46-226.lab.eng.blr.redhat.com} spec.containers{glusterfs} Warning Unhealthy Readiness probe failed: ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: inactive (dead) ps --sort -rss -eo rss,pid,command | head RSS PID COMMAND 152748 1447 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor 110612 738 /usr/lib/systemd/systemd-journald 99420 6392 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2 90352 980 /usr/sbin/dmeventd -f 77332 112183 /usr/bin/docker-current daemon --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --selinux-enabled --log-driver=json-file --log-opt max-size=50m --add-registry registry.ops.openshift.com --add-registry brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888 --add-registry registry.access.redhat.com --insecure-registry registry.ops.openshift.com --insecure-registry brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888 59668 3031 /usr/sbin/rsyslogd -n 50572 112736 /usr/bin/openshift-router 47976 113727 /usr/bin/openshift-deploy 37196 6901 journalctl -k -f sosreports shall be attached shortly. p.s., Please note that the issue was seen even after adding swap memory of 16GB in addition to existing 32GB RAM. However, sosreport was collected before swap memory was added.
The system setup does not seem sufficient. The system should reserve 32 GB of RAM for the gluster container. And have more memory for the system and containers, etc. Here is the guide , sec. 3.2.5.: https://access.redhat.com/documentation/en/red-hat-gluster-storage/3.1/single/container-native-storage-for-openshift-container-platform/ Please retest with such a system.
(In reply to Michael Adam from comment #22) > The system setup does not seem sufficient. > The system should reserve 32 GB of RAM for the gluster container. > And have more memory for the system and containers, etc. > > Here is the guide , sec. 3.2.5.: > > https://access.redhat.com/documentation/en/red-hat-gluster-storage/3.1/ > single/container-native-storage-for-openshift-container-platform/ Michael, I think we should be clear in documenting the memory requirements here. We say each "physical node" hosting RHGS peer will need a minimum of 32GB RAM. From comment#22 I understand that we would need 32GB RAM for "gluster container" alone. This means the "physical node" should have "32GB RAM (for gluster container) + additional memory (for other resources)" for CNS solution to run. I think there is a gap in what we are advertising. Do we have a recommendation of what "additional memory" should be? at least considering physical node only host CNS and not any apps - memory consumption of app is not in our purview, but overall memory consumption of CNS is in our purview. So at least we should have a recommended memory per node for CNS to work without any memory issues. Allocating further memory for apps to run along with CNS can be left to the user. contents of sec 3.2.5 ===================== Ensure that the Trusted Storage Pool is not scaled beyond 100 volumes per 3 nodes per 32G of RAM. A trusted storage pool consists of a minimum of 3 nodes/peers. Distributed-Three-way replication is the only supported volume type. Each physical node that needs to host a Red Hat Gluster Storage peer: will need a minimum of 32GB RAM. is expected to have the same disk type. by default the heketidb utilises 32 GB distributed replica volume. Red Hat Gluster Storage Container Native with OpenShift Container Platform supports up to 14 snapshots per volume. > > Please retest with such a system. QE will be able to retest once we have a recommendation from DEV on memory requirements for CNS.
(In reply to krishnaram Karthick from comment #23) > (In reply to Michael Adam from comment #22) > > The system setup does not seem sufficient. > > The system should reserve 32 GB of RAM for the gluster container. > > And have more memory for the system and containers, etc. > > > > Here is the guide , sec. 3.2.5.: > > > > https://access.redhat.com/documentation/en/red-hat-gluster-storage/3.1/ > > single/container-native-storage-for-openshift-container-platform/ > > Michael, > > I think we should be clear in documenting the memory requirements here. We > say each "physical node" hosting RHGS peer will need a minimum of 32GB RAM. > From comment#22 I understand that we would need 32GB RAM for "gluster > container" alone. This means the "physical node" should have "32GB RAM (for > gluster container) + additional memory (for other resources)" for CNS > solution to run. Correct. > I think there is a gap in what we are advertising. Do we > have a recommendation of what "additional memory" should be? at least > considering physical node only host CNS and not any apps - memory > consumption of app is not in our purview, but overall memory consumption of > CNS is in our purview. So at least we should have a recommended memory per > node for CNS to work without any memory issues. Right, so the assumption is to have at least something like 48 GB RAM on the node, of which 32 are reserved for the gluster container. The actual system and openshift needs memory to run, not counting the actual apps. Note that mounting the volumes into containers, we would need additional memory as well, since the gluster fuse client consumes some 200-500 MB per mount. Hence, in order to mount 50 volumes into containers, one would need to have an additional 25 GB of free RAM on the host... > Allocating further memory > for apps to run along with CNS can be left to the user. > > contents of sec 3.2.5 > ===================== > Ensure that the Trusted Storage Pool is not scaled beyond 100 volumes > per 3 nodes per 32G of RAM. > A trusted storage pool consists of a minimum of 3 nodes/peers. > Distributed-Three-way replication is the only supported volume type. > Each physical node that needs to host a Red Hat Gluster Storage peer: > will need a minimum of 32GB RAM. > is expected to have the same disk type. > by default the heketidb utilises 32 GB distributed replica volume. > Red Hat Gluster Storage Container Native with OpenShift Container > Platform supports up to 14 snapshots per volume. > > > > > Please retest with such a system. > > QE will be able to retest once we have a recommendation from DEV on memory > requirements for CNS. Right, I am going to prepare a detailed mail with the analysis of memory consumption on client and server. Hence come up with new findings. (Note the result may be that we would even need more than the 32GB of RAM on the gluster container...) But for now, please retest with a system of 48GB of RAM, of which (at least) 32GB are reserved for the gluster container. (This is assuming that you have *one* gluster container per node and that you are using the standard replica-3 volumes). Thanks -- Michael
The issue reported in this bug is no more seen after increasing the memory of the test machines to 48GB. Rebooting of pods,nodes spawned gluster pods without any issues. Moving the bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:0149