1605158 – [Tracker] While creating, deleting pvs in loop and running gluster volume heal in gluster pods 2 of the 4 gluster pods are in 0/1 state

Bug 1605158 - [Tracker] While creating, deleting pvs in loop and running gluster volume heal in gluster pods 2 of the 4 gluster pods are in 0/1 state

Summary: [Tracker] While creating, deleting pvs in loop and running gluster volume hea...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	kubernetes
Sub Component:
Version:	cns-3.9
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Humble Chirammal
QA Contact:	Prasanth
Docs Contact:
URL:
Whiteboard:
Depends On:	1601874
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-20 10:22 UTC by vinutha
Modified:	2019-04-14 14:41 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-04-14 14:41:15 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1560428	0	urgent	CLOSED	pods are stuck in containercreating with error grpc: the connection is unavailable	2022-03-13 14:48:20 UTC

Internal Links: 1560428

Description vinutha 2018-07-20 10:22:56 UTC

Description of problem:

*** This bug is created as a clone of 1601874 ***

While creating, deleting 200 pvs in loop and running gluster volume heal in gluster pods 2 of the 4 gluster pods are in 0/1 state


Version-Release number of selected component (if applicable):
# rpm -qa| grep openshift
openshift-ansible-roles-3.9.31-1.git.34.154617d.el7.noarch
atomic-openshift-excluder-3.9.31-1.git.0.ef9737b.el7.noarch
atomic-openshift-master-3.9.31-1.git.0.ef9737b.el7.x86_64
atomic-openshift-sdn-ovs-3.9.31-1.git.0.ef9737b.el7.x86_64
atomic-openshift-3.9.31-1.git.0.ef9737b.el7.x86_64
openshift-ansible-docs-3.9.31-1.git.34.154617d.el7.noarch
openshift-ansible-playbooks-3.9.31-1.git.34.154617d.el7.noarch
atomic-openshift-docker-excluder-3.9.31-1.git.0.ef9737b.el7.noarch
atomic-openshift-node-3.9.31-1.git.0.ef9737b.el7.x86_64
atomic-openshift-clients-3.9.31-1.git.0.ef9737b.el7.x86_64
openshift-ansible-3.9.31-1.git.34.154617d.el7.noarch

# oc rsh glusterfs-storage-mrfh4 
sh-4.2# rpm -qa| grep gluster 
glusterfs-libs-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64
glusterfs-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64
glusterfs-api-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64
glusterfs-fuse-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64
glusterfs-server-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64
gluster-block-0.2.1-14.1.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64
glusterfs-cli-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64
glusterfs-geo-replication-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64
glusterfs-debuginfo-3.8.4-54.10.el7rhgs.1.HOTFIX.CASE02129707.BZ1484412.x86_64

# oc rsh heketi-storage-1-55bw4 
sh-4.2# rpm -qa| grep heketi
python-heketi-6.0.0-7.4.el7rhgs.x86_64
heketi-client-6.0.0-7.4.el7rhgs.x86_64
heketi-6.0.0-7.4.el7rhgs.x86_64

How reproducible:
1:1

Steps to Reproduce:

CNS 4 node setup each node having 1TB device and CPU = 32 (4 cores) Memory = 72GB

1.Created 100 1Gb mongodb pods and ran IO (using dd) 

2.Upgraded the system from 3.9 live build to the experian hotfix build 

3.After all 4 gluster pods have spinned up and in 1/1 running state. All mongodb pods are also in running state. 

4. Initiated creation and deletion of 200 pvs alongwith running gluster volume heal on all 4 gluster pods. 

---- creation and delation of pvs ----------
while true
do
    for i in {101..300}
    do
        ./pvc_create.sh c$i 1; sleep 30;
    done

    sleep 40

    for i in {101..300}
    do
        oc delete pvc c$i; sleep 20;
    done
done
---------------------pv creation/deletion-------------

running gluster volume heal : while true; do for i in $(gluster v list | grep vol); do gluster v heal $i; sleep 2; done; done 

5. 2 gluster pods are in 0/1 state


Actual results:
2 gluster pods are in 0/1 state 

Expected results:
All 4 gluster pods should be in 1/1 Running state. 

Additional info:
logs attached

Comment 10 Humble Chirammal 2018-07-24 15:56:43 UTC

Karthick/Prasanth, can you ack for deferring this bug from the release?

Comment 11 krishnaram Karthick 2018-07-25 09:28:37 UTC

(In reply to Humble Chirammal from comment #10)
> Karthick/Prasanth, can you ack for deferring this bug from the release?

Ack. we can move this out of 3.10

Note You need to log in before you can comment on or make changes to this bug.