Description of problem: ++++++++++++++++++++++++++ In a 5 node OCP cluster with 3 nodes as CNS, we were creating pvcs in a loop of 10 each, along with binding them to app pods. The pvcs were getting created in batches of 10, and a sleep of 10 sec was present between each pvc create. In loop 80th..90th, the pvc creations for pvc#84 -pvc#90 was stuck in "Pending" state for almost 20 mins. As seen from heketi-logs, no heketi-cli(gluster-block create) POST requests reached in that time frame. But after about 20 mins, all the 7 pending pvcs were created successfully in a matter of some seconds. Thus, aftr 20 mins, all pending 7 pvcs were created and changed into "BOUND" state and pvc+pv+heketi+gluster-block device counts finally reached 91. Thus it looks like even though OC pvc create command was run, no request to create pvs actually reached heketi for about 20 mins. Note: In the mentioned timeframe of 20 mins(2018/07/23 13:43:13 UTC to 2018/07/23 14:05:36 UTC) , periodic health-checks were running fine in the heketi pod, along with GET requests. Hence, could it be that oc couldn't manage to send the create requests to heketi for about 20 mins? The requests sent to heketi and time frames =========================================== [kubeexec] DEBUG 2018/07/23 13:43:13 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp46-1.lab.eng.blr.redhat.com Pod: glusterfs-storage-fj674 Command: gluster-block create vol_8317f50bf66dd1bf02a1d7de68ee280a/blk_fiotest_pvc8v2hgcf47e_55261bc7-8e7e-11e8-888e-0a580a810202 ha 3 auth enable prealloc full 10.70.46.1,10.70.46.175,10.70.46.75 1GiB --json Result: { "IQN": "iqn.2016-12.org.gluster-block:a48abcf6-6eca-4ee2-8529-26ee118e4ac0", "USERNAME": "a48abcf6-6eca-4ee2-8529-26ee118e4ac0", "PASSWORD": "38ab9166-0232-4f83-80cb-6658ec4dfa00", "PORTAL(S)": [ "10.70.46.1:3260", "10.70.46.175:3260", "10.70.46.75:3260" ], "RESULT": "SUCCESS" } [heketi] INFO 2018/07/23 13:43:13 Create Block Volume succeeded [kubeexec] DEBUG 2018/07/23 14:05:36 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp46-1.lab.eng.blr.redhat.com Pod: glusterfs-storage-fj674 Command: gluster-block create vol_8317f50bf66dd1bf02a1d7de68ee280a/blk_fiotest_pvcrr5keo41js_75dca5f4-8e81-11e8-888e-0a580a810202 ha 3 auth enable prealloc full 10.70.46.1,10.70.46.175,10.70.46.75 1GiB --json Result: { "IQN": "iqn.2016-12.org.gluster-block:d28cafcf-5765-4cd3-925c-76bb7ec71ddb", "USERNAME": "d28cafcf-5765-4cd3-925c-76bb7ec71ddb", "PASSWORD": "d4fdf7c7-fd11-431b-a28b-70d90e4178c7", "PORTAL(S)": [ "10.70.46.1:3260", "10.70.46.175:3260", "10.70.46.75:3260" ], "RESULT": "SUCCESS" } The command used to create 80th to 90th PVC+app pod ---------------------------------------- root@dhcp47-178 openshift_scalability]# python cluster-loader.py -f content/fio/fio-parameters.yaml && date oc v3.10.18 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://dhcp47-178.lab.eng.blr.redhat.com:8443 openshift v3.10.18 kubernetes v1.10.0+b81c8f8 forking fiotest project.project.openshift.io/fiotest templates: [{'num': 10, 'file': './content/fio/fio-template.json', 'parameters': [{'STORAGE_CLASS': 'gluster-block'}, {'STORAGE_SIZE': '1Gi'}, {'MOUNT_PATH': '/mnt/pvcmount'}, {'DOCKER_IMAGE': 'r7perffio'}]}] persistentvolumeclaim "pvck3fq6mx8lb" created pod "fio-pod-j44m9" created persistentvolumeclaim "pvc7bcu5ryh8q" created pod "fio-pod-jt7q2" created persistentvolumeclaim "pvc8v2hgcf47e" created pod "fio-pod-ppmfl" created persistentvolumeclaim "pvciilomtvr60" created pod "fio-pod-9s5j5" created persistentvolumeclaim "pvceksh62yahg" created pod "fio-pod-ch5lr" created persistentvolumeclaim "pvcrr5keo41js" created pod "fio-pod-rlwkm" created persistentvolumeclaim "pvcmwjuvdojc0" created pod "fio-pod-wbvpx" created persistentvolumeclaim "pvckpuvgytjei" created pod "fio-pod-fhzzj" created persistentvolumeclaim "pvcjqxs3jx072" created pod "fio-pod-b47sn" created persistentvolumeclaim "pvc1e8wh7cfxu" created pod "fio-pod-z8f2d" created Mon Jul 23 19:15:49 IST 2018 [root@dhcp47-178 openshift_scalability]# ********************************************************************** The time when 83rd pvc was created (from gluster-blockd logs) ----------------------------------- [2018-07-23 13:43:09.422498] INFO: create request, volume=vol_8317f50bf66dd1bf02a1d7de68ee280a blockname=blk_fiotest_pvc8v2hgcf47e_55261bc7-8e7e-11e8-888e-0a580a810202 blockhosts=10.70.46.1,10.70.46.175,10.70.46.75 filename=a48abcf6-6eca-4ee2-8529-26ee118e4ac0 authmode=1 passwd=38ab9166-0232-4f83-80cb-6658ec4dfa00 size=1073741824 [at block_svc_routines.c+4191 :<block_create_common>] [2018-07-23 13:43:12.626683] INFO: command exit code, 0 [at block_svc_routines.c+4367 :<block_create_common>] Time from glusterfs pod when finally the pending pvcs started getting created - -------------------------------------------------------------------------------- [2018-07-23 14:05:32.912977] INFO: create request, volume=vol_8317f50bf66dd1bf02a1d7de68ee280a blockname=blk_fiotest_pvcrr5keo41js_75dca5f4-8e81-11e8-888e-0a580a810202 blockhosts=10.70.46.1,10.70.46.175,10.70.46.75 filename=d28cafcf-5765-4cd3-925c-76bb7ec71ddb authmode=1 passwd=d4fdf7c7-fd11-431b-a28b-70d90e4178c7 size=1073741824 [at block_svc_routines.c+4191 :<block_create_common>] [2018-07-23 14:05:36.571840] INFO: command exit code, 0 [at block_svc_routines.c+4367 :<block_create_common>] [2018-07-23 14:05:38.357607] INFO: create request, volume=vol_8317f50bf66dd1bf02a1d7de68ee280a blockname=blk_fiotest_pvciilomtvr60_75dcbc61-8e81-11e8-888e-0a580a810202 blockhosts=10.70.46.1,10.70.46.175,10.70.46.75 filename=d5737056-f29c-4a4f-8d6c-013c7885c7dd authmode=1 passwd=ccac922f-93d2-4228-be0a-87307556520e size=1073741824 [at block_svc_routines.c+4191 :<block_create_common>] [2018-07-23 14:05:41.589666] INFO: command exit code, 0 [at block_svc_routines.c+4367 :<block_create_common>] [2018-07-23 14:07:24.338279] INFO: create request, volume=vol_8317f50bf66dd1bf02a1d7de68ee280a blockname=blk_fiotest_pvc1e8wh7cfxu_b854ffcb-8e81-11e8-888e-0a580a810202 blockhosts=10.70.46.1,10.70.46.175,10.70.46.75 filename=3286f8c1-7145-414e-877a-48bda632f606 authmode=1 passwd=b3bb5f70-2acd-42d7-be38-5dfc6c79bb30 size=1073741824 [at block_svc_routines.c+4191 :<block_create_common>] [2018-07-23 14:07:27.711834] INFO: command exit code, 0 [at block_svc_routines.c+4367 :<block_create_common>] ^C Some outputs ++++++++++++ [root@dhcp47-178 ~]# oc get pvc --all-namespaces -w|grep -i pending ; date fiotest pvc1e8wh7cfxu Pending gluster-block 20m fiotest pvceksh62yahg Pending gluster-block 22m fiotest pvciilomtvr60 Pending gluster-block 22m fiotest pvcjqxs3jx072 Pending gluster-block 21m fiotest pvckpuvgytjei Pending gluster-block 21m fiotest pvcmwjuvdojc0 Pending gluster-block 21m fiotest pvcrr5keo41js Pending gluster-block 22m fiotest pvc1e8wh7cfxu Pending gluster-block 21m fiotest pvcrr5keo41js Pending pvc-788e226d-8e7e-11e8-863e-005056a5aac9 0 gluster-block 23m fiotest pvckpuvgytjei Pending gluster-block 22m fiotest pvciilomtvr60 Pending pvc-600abde1-8e7e-11e8-863e-005056a5aac9 0 gluster-block 23m fiotest pvc1e8wh7cfxu Pending pvc-a93f5d13-8e7e-11e8-863e-005056a5aac9 0 gluster-block 22m fiotest pvcjqxs3jx072 Pending gluster-block 22m ---------------------------- PVC counts when the request for 7 pvcs havents yet reached heketi +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [root@dhcp47-178 ~]# heketi-cli blockvolume list|wc -l ; date 84 Mon Jul 23 19:19:06 IST 2018 ---------------------------- sh-4.2# for i in `gluster volume list|grep vol`; do echo $i; echo +++++++++++++++++; gluster-block list $i |wc -l; echo ""; done 84 sh-4.2# date Mon Jul 23 13:49:22 UTC 2018 ---------------------------- [root@dhcp47-178 100pods]# oc get pv --all-namespaces|wc -l; date 85 Mon Jul 23 19:33:45 IST 2018 ---------------------------- [root@dhcp47-178 100pods]# oc get pvc --all-namespaces|wc -l; date 92 Mon Jul 23 19:33:52 IST 2018 ---------------------------- Note: Finally all the volumes are created successfully and are in Bound state. But the reason for no POST requests reaching heketi for 20 mins is unknown. Current volume count - once all pending pvcs are bound ++++++++++++++++++++ [root@dhcp47-178 ~]# heketi-cli blockvolume list|wc -l && date 91 Mon Jul 23 20:51:41 IST 2018 [root@dhcp47-178 ~]# [root@dhcp47-178 ~]# oc rsh glusterfs-storage-fj674 sh-4.2# gluster v list heketidbstorage vol_8317f50bf66dd1bf02a1d7de68ee280a sh-4.2# gluster-block list vol_8317f50bf66dd1bf02a1d7de68ee280a |wc -l && date 91 Mon Jul 23 15:21:38 UTC 2018 sh-4.2# [root@dhcp47-178 100pods]# oc get pvc --all-namespaces|grep -v NAME|wc -l && date 91 Mon Jul 23 20:52:03 IST 2018 [root@dhcp47-178 100pods]# oc get pv --all-namespaces|grep -v NAME|wc -l && date 91 Mon Jul 23 20:52:08 IST 2018 [root@dhcp47-178 100pods]# Version-Release number of selected component (if applicable): ++++++++++++++++++++++++++ [root@dhcp47-178 ~]# oc version oc v3.10.18 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://dhcp47-178.lab.eng.blr.redhat.com:8443 openshift v3.10.18 kubernetes v1.10.0+b81c8f8 [root@dhcp47-178 ~]# oc rsh glusterfs-storage- glusterfs-storage-4ffb2 glusterfs-storage-9bjx9 glusterfs-storage-fj674 [root@dhcp47-178 ~]# oc rsh glusterfs-storage-4ffb2 rpm -qa|grep gluster glusterfs-libs-3.8.4-54.15.el7rhgs.x86_64 glusterfs-3.8.4-54.15.el7rhgs.x86_64 glusterfs-api-3.8.4-54.15.el7rhgs.x86_64 glusterfs-cli-3.8.4-54.15.el7rhgs.x86_64 glusterfs-server-3.8.4-54.15.el7rhgs.x86_64 gluster-block-0.2.1-22.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-54.15.el7rhgs.x86_64 glusterfs-fuse-3.8.4-54.15.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-54.15.el7rhgs.x86_64 [root@dhcp47-178 ~]# oc rsh heketi-storage-1-6st7z rpm -qa|grep heketi python-heketi-7.0.0-4.el7rhgs.x86_64 heketi-client-7.0.0-4.el7rhgs.x86_64 heketi-7.0.0-4.el7rhgs.x86_64 [root@dhcp47-178 ~]# How reproducible: ++++++++++++++++++++++++++ 2x2 Steps to Reproduce: ++++++++++++++++++++++++++ 1. In a 3node CNS cluster, start a loop/script to create pvcs+app pods(with those pvcs attached) in batches of 10 2. Check for the pvc status in each batch and confirm that all pvc create requests are reaching heketi within some seconds 3. In the loop for creating pvcs 80th to 90th, check for pvc status and heketi logs (i ) it is seen that even though pvc create commands are executed for 84th to 90th pvc, the 7 pvcs stayed in pending state for almost 20 mins. (ii) The request to create the block devices reached heketi after 20 mins, and then the pvcs were created successfully. (iii) For almost 20 mins, the pvc count was 91 but pv count was 84 as 7 pvcs stayed in pending state. Actual results: ++++++++++++++++++++++++++ In the loop for creating pvcs 80th to 90th, check for pvc status and heketi logs (i ) it is seen that even though pvc create commands are executed for 84th to 90th pvc, the 7 pvcs stayed in pending state for almost 20 mins. (ii) The request to create the block devices reached heketi after 20 mins, and then the pvcs were created successfully. (iii) For almost 20 mins, the pvc count was 91 but pv count was 84 as 7 pvcs stayed in pending state. Expected results: ++++++++++++++++++++++++++ Requests for pvc creations for #1 - 83 reached heketi in no time and they came in Bound state within some seconds. Similarly, even for the 84th to 90th pvcs, there should not have been a time lag of almost 20 mins for pvcs to actually get created and come in Bound state.
Fixed in version : rhgs-gluster-block-prov-container-3.3.1-23
Today's build supposed to have an improvement in this code path brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-gluster-block-prov-rhel7:3.4.0-2 Please find the result with above image. I am moving the bug to ON_QA
*** Bug 1668629 has been marked as a duplicate of this bug. ***
*** Bug 1700493 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3260