Description of problem: I was creating block pvcs in batches of 25 with delay of 10 second. It was seen first 50 pvcs were bound successful, but next 25 was giving server busy error. After some time out of 25 pvcs 7 pvcs are still in pending state from 20 hours. we have similar bug "1601904" with respect to glusterfs also, raising a new bug to report the issue in gluster-block. Version-Release number of selected component (if applicable): glusterfs-client-xlators-3.8.4-54.15.el7rhgs.x86_64 glusterfs-fuse-3.8.4-54.15.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-54.15.el7rhgs.x86_64 glusterfs-libs-3.8.4-54.15.el7rhgs.x86_64 glusterfs-3.8.4-54.15.el7rhgs.x86_64 glusterfs-api-3.8.4-54.15.el7rhgs.x86_64 glusterfs-cli-3.8.4-54.15.el7rhgs.x86_64 glusterfs-server-3.8.4-54.15.el7rhgs.x86_64 gluster-block-0.2.1-23.el7rhgs.x86_64 heketi-7.0.0-5.el7rhgs.x86_64 How reproducible: Steps to Reproduce: 1. Create block pvcs in patches of 25 with delay of 10 seconds. Actual results: Pvcs are in pending state from 20 hours [root@dhcp47-74 ~]# oc get pvc | grep Pend pvc1bc723rqmf Pending block-sc 20h pvc5wuij2l0gu Pending block-sc 20h pvca2i6bp6kmp Pending block-sc 20h pvcsc82000mvw Pending block-sc 20h pvcx354g4r3ul Pending block-sc 20h pvcypse3hxb60 Pending block-sc 20h pvcysqemf0ww3 Pending block-sc 20h Expected results: Pvcs should not be in pending state for 20 hours. Additional info:
Logs and Sosreports :-> http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1612013/
It looks like the gluster cluster is not in good state, need to look into this in detail. [cmdexec] INFO 2018/08/03 09:40:01 Check Glusterd service status in node dhcp47-6.lab.eng.blr.redhat.com [kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: unexpected EOF [kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods [cmdexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods [heketi] INFO 2018/08/03 09:40:38 Periodic health check status: node 25a62f65f04d287530d0022f17c9a439 up=false [cmdexec] INFO 2018/08/03 09:40:38 Check Glusterd service status in node dhcp46-167.lab.eng.blr.redhat.com [kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused [kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods [cmdexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods [heketi] INFO 2018/08/03 09:40:38 Periodic health check status: node 2cec134709d4eedd115df888a98dfc99 up=false [cmdexec] INFO 2018/08/03 09:40:38 Check Glusterd service status in node dhcp46-174.lab.eng.blr.redhat.com [heketi] INFO 2018/08/03 09:40:38 Periodic health check status: node 406044bb661bef3017c56dd25fdb01b4 up=false [heketi] INFO 2018/08/03 09:40:38 Cleaned 0 nodes from health cache [kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused [kubeexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods [cmdexec] ERROR 2018/08/03 09:40:38 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods [heketi] INFO 2018/08/03 09:42:01 Starting Node Health Status refresh [cmdexec] INFO 2018/08/03 09:42:01 Check Glusterd service status in node dhcp47-6.lab.eng.blr.redhat.com [heketi] INFO 2018/08/03 09:42:01 Periodic health check status: node 25a62f65f04d287530d0022f17c9a439 up=false [cmdexec] INFO 2018/08/03 09:42:01 Check Glusterd service status in node dhcp46-167.lab.eng.blr.redhat.com [kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused [kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods [cmdexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods [kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused [kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods [cmdexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods [heketi] INFO 2018/08/03 09:42:01 Periodic health check status: node 2cec134709d4eedd115df888a98dfc99 up=false [cmdexec] INFO 2018/08/03 09:42:01 Check Glusterd service status in node dhcp46-174.lab.eng.blr.redhat.com [kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:298: Get https://172.31.0.1:443/api/v1/namespaces/glusterfs/pods?labelSelector=glusterfs-node: dial tcp 172.31.0.1:443: getsockopt: connection refused [kubeexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:299: Failed to get list of pods [cmdexec] ERROR 2018/08/03 09:42:01 /src/github.com/heketi/heketi/executors/cmdexec/peer.go:76: Failed to get list of pods [heketi] INFO 2018/08/03 09:42:01 Periodic health check status: node 406044bb661bef3017c56dd25fdb01b4 up=false
Humble, agreed. To me it may be more than just the gluster components of the cluster. This heketi instance does not appear to be able to connect to k8s api to determine what pods to talk to. Can you curl from inside the heketi pod to https://172.31.0.1:443 ?
Closing as per comment 5