Bug 1612013

Summary:	creation of block pvcs is in pending state since 20 hours
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Nitin Goyal <nigoyal>
Component:	heketi	Assignee:	John Mulligan <jmulligan>
Status:	CLOSED INSUFFICIENT_DATA	QA Contact:	Nitin Goyal <nigoyal>
Severity:	high	Docs Contact:
Priority:	high
Version:	cns-3.10	CC:	hchiramm, kramdoss, madam, nchilaka, nigoyal, rhs-bugs, rtalur, sankarshan, storage-qa-internal
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-08-10 14:53:25 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1568862

Description Nitin Goyal 2018-08-03 09:01:27 UTC

Description of problem:
I was creating block pvcs in batches of 25 with delay of 10 second. It was seen first 50 pvcs were bound successful, but next 25 was giving server busy error. After some time out of 25 pvcs 7 pvcs are still in pending state from 20 hours.

we have similar bug "1601904" with respect to glusterfs also, raising a new bug to report the issue in gluster-block.


Version-Release number of selected component (if applicable):
glusterfs-client-xlators-3.8.4-54.15.el7rhgs.x86_64
glusterfs-fuse-3.8.4-54.15.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-54.15.el7rhgs.x86_64
glusterfs-libs-3.8.4-54.15.el7rhgs.x86_64
glusterfs-3.8.4-54.15.el7rhgs.x86_64
glusterfs-api-3.8.4-54.15.el7rhgs.x86_64
glusterfs-cli-3.8.4-54.15.el7rhgs.x86_64
glusterfs-server-3.8.4-54.15.el7rhgs.x86_64
gluster-block-0.2.1-23.el7rhgs.x86_64
heketi-7.0.0-5.el7rhgs.x86_64

How reproducible:


Steps to Reproduce:
1. Create block pvcs in patches of 25 with delay of 10 seconds.

Actual results:
Pvcs are in pending state from 20 hours
[root@dhcp47-74 ~]# oc get pvc | grep Pend 
pvc1bc723rqmf   Pending                                                                        block-sc       20h
pvc5wuij2l0gu   Pending                                                                        block-sc       20h
pvca2i6bp6kmp   Pending                                                                        block-sc       20h
pvcsc82000mvw   Pending                                                                        block-sc       20h
pvcx354g4r3ul   Pending                                                                        block-sc       20h
pvcypse3hxb60   Pending                                                                        block-sc       20h
pvcysqemf0ww3   Pending                                                                        block-sc       20h


Expected results:
Pvcs should not be in pending state for 20 hours.

Additional info:

Comment 2 Nitin Goyal 2018-08-03 11:21:17 UTC

Logs and Sosreports :->

http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1612013/

Comment 3 Humble Chirammal 2018-08-07 11:57:01 UTC

It looks like the gluster cluster is not in good state, need to look into this in detail.

[cmdexec] INFO 2018/08/03 09:40:01 Check Glusterd service status in node dhcp47-6.lab.eng.blr.redhat.com
[kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: unexpected EOF
[kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods
[cmdexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods
[heketi] INFO 2018/08/03 09:40:38 Periodic health check status: node 25a62f65f04d287530d0022f17c9a439 up=false
[cmdexec] INFO 2018/08/03 09:40:38 Check Glusterd service status in node dhcp46-167.lab.eng.blr.redhat.com
[kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused
[kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods
[cmdexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods
[heketi] INFO 2018/08/03 09:40:38 Periodic health check status: node 2cec134709d4eedd115df888a98dfc99 up=false
[cmdexec] INFO 2018/08/03 09:40:38 Check Glusterd service status in node dhcp46-174.lab.eng.blr.redhat.com
[heketi] INFO 2018/08/03 09:40:38 Periodic health check status: node 406044bb661bef3017c56dd25fdb01b4 up=false
[heketi] INFO 2018/08/03 09:40:38 Cleaned 0 nodes from health cache
[kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused
[kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods

[cmdexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods
[heketi] INFO 2018/08/03 09:42:01 Starting Node Health Status refresh
[cmdexec] INFO 2018/08/03 09:42:01 Check Glusterd service status in node dhcp47-6.lab.eng.blr.redhat.com
[heketi] INFO 2018/08/03 09:42:01 Periodic health check status: node 25a62f65f04d287530d0022f17c9a439 up=false


[cmdexec] INFO 2018/08/03 09:42:01 Check Glusterd service status in node dhcp46-167.lab.eng.blr.redhat.com
[kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused
[kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods
[cmdexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods
[kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused
[kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods
[cmdexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods
[heketi] INFO 2018/08/03 09:42:01 Periodic health check status: node 2cec134709d4eedd115df888a98dfc99 up=false
[cmdexec] INFO 2018/08/03 09:42:01 Check Glusterd service status in node dhcp46-174.lab.eng.blr.redhat.com
[kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused
[kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods
[cmdexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods

[heketi] INFO 2018/08/03 09:42:01 Periodic health check status: node 406044bb661bef3017c56dd25fdb01b4 up=false

Comment 4 John Mulligan 2018-08-08 18:35:08 UTC

Humble, agreed. To me it may be more than just the gluster components of the cluster. This heketi instance does not appear to be able to connect to k8s api to determine what pods to talk to.

Can you curl from inside the heketi pod to https://172.31.0.1:443 ?

Comment 6 Raghavendra Talur 2018-08-10 14:53:25 UTC

Closing as per comment 5