Description of problem: *** This bug is created as a clone of 1601874 *** While creating, deleting 200 pvs in loop and running gluster volume heal in gluster pods 2 of the 4 gluster pods are in 0/1 state Version-Release number of selected component (if applicable): # rpm -qa| grep openshift openshift-ansible-roles-3.9.31-1.git.34.154617d.el7.noarch atomic-openshift-excluder-3.9.31-1.git.0.ef9737b.el7.noarch atomic-openshift-master-3.9.31-1.git.0.ef9737b.el7.x86_64 atomic-openshift-sdn-ovs-3.9.31-1.git.0.ef9737b.el7.x86_64 atomic-openshift-3.9.31-1.git.0.ef9737b.el7.x86_64 openshift-ansible-docs-3.9.31-1.git.34.154617d.el7.noarch openshift-ansible-playbooks-3.9.31-1.git.34.154617d.el7.noarch atomic-openshift-docker-excluder-3.9.31-1.git.0.ef9737b.el7.noarch atomic-openshift-node-3.9.31-1.git.0.ef9737b.el7.x86_64 atomic-openshift-clients-3.9.31-1.git.0.ef9737b.el7.x86_64 openshift-ansible-3.9.31-1.git.34.154617d.el7.noarch # oc rsh glusterfs-storage-mrfh4 sh-4.2# rpm -qa| grep gluster glusterfs-libs-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 glusterfs-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 glusterfs-api-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 glusterfs-fuse-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 glusterfs-server-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 gluster-block-0.2.1-14.1.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 glusterfs-cli-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 glusterfs-geo-replication-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 glusterfs-debuginfo-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64 # oc rsh heketi-storage-1-55bw4 sh-4.2# rpm -qa| grep heketi python-heketi-6.0.0-7.4.el7rhgs.x86_64 heketi-client-6.0.0-7.4.el7rhgs.x86_64 heketi-6.0.0-7.4.el7rhgs.x86_64 How reproducible: 1:1 Steps to Reproduce: CNS 4 node setup each node having 1TB device and CPU = 32 (4 cores) Memory = 72GB 1.Created 100 1Gb mongodb pods and ran IO (using dd) 2.Upgraded the system from 3.9 live build to the experian hotfix build 3.After all 4 gluster pods have spinned up and in 1/1 running state. All mongodb pods are also in running state. 4. Initiated creation and deletion of 200 pvs alongwith running gluster volume heal on all 4 gluster pods. ---- creation and delation of pvs ---------- while true do for i in {101..300} do ./pvc_create.sh c$i 1; sleep 30; done sleep 40 for i in {101..300} do oc delete pvc c$i; sleep 20; done done ---------------------pv creation/deletion------------- running gluster volume heal : while true; do for i in $(gluster v list | grep vol); do gluster v heal $i; sleep 2; done; done 5. 2 gluster pods are in 0/1 state Actual results: 2 gluster pods are in 0/1 state Expected results: All 4 gluster pods should be in 1/1 Running state. Additional info: logs attached
Karthick/Prasanth, can you ack for deferring this bug from the release?
(In reply to Humble Chirammal from comment #10) > Karthick/Prasanth, can you ack for deferring this bug from the release? Ack. we can move this out of 3.10