Bug 1612013 - creation of block pvcs is in pending state since 20 hours
Summary: creation of block pvcs is in pending state since 20 hours
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: cns-3.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: John Mulligan
QA Contact: Nitin Goyal
URL:
Whiteboard:
Depends On:
Blocks: 1568862
TreeView+ depends on / blocked
 
Reported: 2018-08-03 09:01 UTC by Nitin Goyal
Modified: 2018-08-27 12:02 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-10 14:53:25 UTC
Embargoed:


Attachments (Terms of Use)

Description Nitin Goyal 2018-08-03 09:01:27 UTC
Description of problem:
I was creating block pvcs in batches of 25 with delay of 10 second. It was seen first 50 pvcs were bound successful, but next 25 was giving server busy error. After some time out of 25 pvcs 7 pvcs are still in pending state from 20 hours.

we have similar bug "1601904" with respect to glusterfs also, raising a new bug to report the issue in gluster-block.


Version-Release number of selected component (if applicable):
glusterfs-client-xlators-3.8.4-54.15.el7rhgs.x86_64
glusterfs-fuse-3.8.4-54.15.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-54.15.el7rhgs.x86_64
glusterfs-libs-3.8.4-54.15.el7rhgs.x86_64
glusterfs-3.8.4-54.15.el7rhgs.x86_64
glusterfs-api-3.8.4-54.15.el7rhgs.x86_64
glusterfs-cli-3.8.4-54.15.el7rhgs.x86_64
glusterfs-server-3.8.4-54.15.el7rhgs.x86_64
gluster-block-0.2.1-23.el7rhgs.x86_64
heketi-7.0.0-5.el7rhgs.x86_64

How reproducible:


Steps to Reproduce:
1. Create block pvcs in patches of 25 with delay of 10 seconds.

Actual results:
Pvcs are in pending state from 20 hours
[root@dhcp47-74 ~]# oc get pvc | grep Pend 
pvc1bc723rqmf   Pending                                                                        block-sc       20h
pvc5wuij2l0gu   Pending                                                                        block-sc       20h
pvca2i6bp6kmp   Pending                                                                        block-sc       20h
pvcsc82000mvw   Pending                                                                        block-sc       20h
pvcx354g4r3ul   Pending                                                                        block-sc       20h
pvcypse3hxb60   Pending                                                                        block-sc       20h
pvcysqemf0ww3   Pending                                                                        block-sc       20h


Expected results:
Pvcs should not be in pending state for 20 hours.

Additional info:

Comment 2 Nitin Goyal 2018-08-03 11:21:17 UTC
Logs and Sosreports :->

http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1612013/

Comment 3 Humble Chirammal 2018-08-07 11:57:01 UTC
It looks like the gluster cluster is not in good state, need to look into this in detail.

[cmdexec] INFO 2018/08/03 09:40:01 Check Glusterd service status in node dhcp47-6.lab.eng.blr.redhat.com
[kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: unexpected EOF
[kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods
[cmdexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods
[heketi] INFO 2018/08/03 09:40:38 Periodic health check status: node 25a62f65f04d287530d0022f17c9a439 up=false
[cmdexec] INFO 2018/08/03 09:40:38 Check Glusterd service status in node dhcp46-167.lab.eng.blr.redhat.com
[kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused
[kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods
[cmdexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods
[heketi] INFO 2018/08/03 09:40:38 Periodic health check status: node 2cec134709d4eedd115df888a98dfc99 up=false
[cmdexec] INFO 2018/08/03 09:40:38 Check Glusterd service status in node dhcp46-174.lab.eng.blr.redhat.com
[heketi] INFO 2018/08/03 09:40:38 Periodic health check status: node 406044bb661bef3017c56dd25fdb01b4 up=false
[heketi] INFO 2018/08/03 09:40:38 Cleaned 0 nodes from health cache
[kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused
[kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods

[cmdexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods
[heketi] INFO 2018/08/03 09:42:01 Starting Node Health Status refresh
[cmdexec] INFO 2018/08/03 09:42:01 Check Glusterd service status in node dhcp47-6.lab.eng.blr.redhat.com
[heketi] INFO 2018/08/03 09:42:01 Periodic health check status: node 25a62f65f04d287530d0022f17c9a439 up=false


[cmdexec] INFO 2018/08/03 09:42:01 Check Glusterd service status in node dhcp46-167.lab.eng.blr.redhat.com
[kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused
[kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods
[cmdexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods
[kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused
[kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods
[cmdexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods
[heketi] INFO 2018/08/03 09:42:01 Periodic health check status: node 2cec134709d4eedd115df888a98dfc99 up=false
[cmdexec] INFO 2018/08/03 09:42:01 Check Glusterd service status in node dhcp46-174.lab.eng.blr.redhat.com
[kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused
[kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods
[cmdexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods

[heketi] INFO 2018/08/03 09:42:01 Periodic health check status: node 406044bb661bef3017c56dd25fdb01b4 up=false

Comment 4 John Mulligan 2018-08-08 18:35:08 UTC
Humble, agreed. To me it may be more than just the gluster components of the cluster. This heketi instance does not appear to be able to connect to k8s api to determine what pods to talk to.

Can you curl from inside the heketi pod to https://172.31.0.1:443 ?

Comment 6 Raghavendra Talur 2018-08-10 14:53:25 UTC
Closing as per comment 5


Note You need to log in before you can comment on or make changes to this bug.