Bug 1607520 - While creating 100 block PVCs in loops of 10 , 7(84th to 90th) PVCs stayed in PENDING state for 20 mins
Summary: While creating 100 block PVCs in loops of 10 , 7(84th to 90th) PVCs stayed in...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhgs-gluster-block-prov-container
Version: cns-3.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 3.11.z Batch Update 4
Assignee: Humble Chirammal
QA Contact: Rachael
URL:
Whiteboard:
: 1668629 1700493 (view as bug list)
Depends On:
Blocks: 1669979 1707226
TreeView+ depends on / blocked
 
Reported: 2018-07-23 16:04 UTC by Neha Berry
Modified: 2019-10-30 12:39 UTC (History)
12 users (show)

Fixed In Version: rhgs-gluster-block-prov-rhel7:3.11.4-5
Doc Type: Bug Fix
Doc Text:
Earlier, some PVC requests stayed in PENDING state for a long duration because of an issue in the controller library used by the glusterblock provisioner. With the release, this issue has been fixed.
Clone Of:
Environment:
Last Closed: 2019-10-30 12:39:11 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3260 0 None None None 2019-10-30 12:39:19 UTC

Description Neha Berry 2018-07-23 16:04:16 UTC
Description of problem:
++++++++++++++++++++++++++
In a 5 node OCP cluster with 3 nodes as CNS, we were creating pvcs in a loop of 10 each, along with binding them to app pods.

The pvcs were getting created in batches of 10, and a sleep of 10 sec was present between each pvc create.

In loop 80th..90th, the pvc creations for pvc#84 -pvc#90 was stuck in "Pending" state for almost 20 mins. As seen from heketi-logs, no heketi-cli(gluster-block create) POST requests reached in that time frame. But after about 20 mins, all the 7 pending pvcs were created successfully in a matter of some seconds.
Thus, aftr 20 mins, all pending 7 pvcs were created and changed into "BOUND" state and pvc+pv+heketi+gluster-block device counts finally reached 91.

Thus it looks like even though OC pvc create command was run, no request to create pvs actually reached heketi for about 20 mins. 

Note: In the mentioned timeframe of 20 mins(2018/07/23 13:43:13 UTC to 2018/07/23 14:05:36 UTC) , periodic health-checks were running fine in the heketi pod, along with GET requests. Hence, could it be that oc couldn't manage to send the create requests to heketi for about 20 mins?


The requests sent to heketi and time frames
===========================================


[kubeexec] DEBUG 2018/07/23 13:43:13 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp46-1.lab.eng.blr.redhat.com Pod: glusterfs-storage-fj674 Command: gluster-block create vol_8317f50bf66dd1bf02a1d7de68ee280a/blk_fiotest_pvc8v2hgcf47e_55261bc7-8e7e-11e8-888e-0a580a810202  ha 3 auth enable prealloc full 10.70.46.1,10.70.46.175,10.70.46.75 1GiB --json
Result: { "IQN": "iqn.2016-12.org.gluster-block:a48abcf6-6eca-4ee2-8529-26ee118e4ac0", "USERNAME": "a48abcf6-6eca-4ee2-8529-26ee118e4ac0", "PASSWORD": "38ab9166-0232-4f83-80cb-6658ec4dfa00", "PORTAL(S)": [ "10.70.46.1:3260", "10.70.46.175:3260", "10.70.46.75:3260" ], "RESULT": "SUCCESS" }
[heketi] INFO 2018/07/23 13:43:13 Create Block Volume succeeded


[kubeexec] DEBUG 2018/07/23 14:05:36 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp46-1.lab.eng.blr.redhat.com Pod: glusterfs-storage-fj674 Command: gluster-block create vol_8317f50bf66dd1bf02a1d7de68ee280a/blk_fiotest_pvcrr5keo41js_75dca5f4-8e81-11e8-888e-0a580a810202  ha 3 auth enable prealloc full 10.70.46.1,10.70.46.175,10.70.46.75 1GiB --json
Result: { "IQN": "iqn.2016-12.org.gluster-block:d28cafcf-5765-4cd3-925c-76bb7ec71ddb", "USERNAME": "d28cafcf-5765-4cd3-925c-76bb7ec71ddb", "PASSWORD": "d4fdf7c7-fd11-431b-a28b-70d90e4178c7", "PORTAL(S)": [ "10.70.46.1:3260", "10.70.46.175:3260", "10.70.46.75:3260" ], "RESULT": "SUCCESS" }



The command used to create 80th to 90th PVC+app pod 
----------------------------------------
root@dhcp47-178 openshift_scalability]# python cluster-loader.py -f content/fio/fio-parameters.yaml && date
oc v3.10.18
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dhcp47-178.lab.eng.blr.redhat.com:8443
openshift v3.10.18
kubernetes v1.10.0+b81c8f8

forking fiotest
project.project.openshift.io/fiotest

templates:  [{'num': 10, 'file': './content/fio/fio-template.json', 'parameters': [{'STORAGE_CLASS': 'gluster-block'}, {'STORAGE_SIZE': '1Gi'}, {'MOUNT_PATH': '/mnt/pvcmount'}, {'DOCKER_IMAGE': 'r7perffio'}]}]
persistentvolumeclaim "pvck3fq6mx8lb" created
pod "fio-pod-j44m9" created

persistentvolumeclaim "pvc7bcu5ryh8q" created
pod "fio-pod-jt7q2" created

persistentvolumeclaim "pvc8v2hgcf47e" created
pod "fio-pod-ppmfl" created

persistentvolumeclaim "pvciilomtvr60" created
pod "fio-pod-9s5j5" created

persistentvolumeclaim "pvceksh62yahg" created
pod "fio-pod-ch5lr" created

persistentvolumeclaim "pvcrr5keo41js" created
pod "fio-pod-rlwkm" created

persistentvolumeclaim "pvcmwjuvdojc0" created
pod "fio-pod-wbvpx" created

persistentvolumeclaim "pvckpuvgytjei" created
pod "fio-pod-fhzzj" created

persistentvolumeclaim "pvcjqxs3jx072" created
pod "fio-pod-b47sn" created

persistentvolumeclaim "pvc1e8wh7cfxu" created
pod "fio-pod-z8f2d" created

Mon Jul 23 19:15:49 IST 2018
[root@dhcp47-178 openshift_scalability]# 

**********************************************************************

The time when 83rd pvc was created (from gluster-blockd logs)
-----------------------------------
[2018-07-23 13:43:09.422498] INFO: create request, volume=vol_8317f50bf66dd1bf02a1d7de68ee280a blockname=blk_fiotest_pvc8v2hgcf47e_55261bc7-8e7e-11e8-888e-0a580a810202 blockhosts=10.70.46.1,10.70.46.175,10.70.46.75 filename=a48abcf6-6eca-4ee2-8529-26ee118e4ac0 authmode=1 passwd=38ab9166-0232-4f83-80cb-6658ec4dfa00 size=1073741824 [at block_svc_routines.c+4191 :<block_create_common>]
[2018-07-23 13:43:12.626683] INFO: command exit code, 0 [at block_svc_routines.c+4367 :<block_create_common>]

Time from glusterfs pod when finally the pending pvcs started getting created - 
--------------------------------------------------------------------------------

[2018-07-23 14:05:32.912977] INFO: create request, volume=vol_8317f50bf66dd1bf02a1d7de68ee280a blockname=blk_fiotest_pvcrr5keo41js_75dca5f4-8e81-11e8-888e-0a580a810202 blockhosts=10.70.46.1,10.70.46.175,10.70.46.75 filename=d28cafcf-5765-4cd3-925c-76bb7ec71ddb authmode=1 passwd=d4fdf7c7-fd11-431b-a28b-70d90e4178c7 size=1073741824 [at block_svc_routines.c+4191 :<block_create_common>]
[2018-07-23 14:05:36.571840] INFO: command exit code, 0 [at block_svc_routines.c+4367 :<block_create_common>]
[2018-07-23 14:05:38.357607] INFO: create request, volume=vol_8317f50bf66dd1bf02a1d7de68ee280a blockname=blk_fiotest_pvciilomtvr60_75dcbc61-8e81-11e8-888e-0a580a810202 blockhosts=10.70.46.1,10.70.46.175,10.70.46.75 filename=d5737056-f29c-4a4f-8d6c-013c7885c7dd authmode=1 passwd=ccac922f-93d2-4228-be0a-87307556520e size=1073741824 [at block_svc_routines.c+4191 :<block_create_common>]
[2018-07-23 14:05:41.589666] INFO: command exit code, 0 [at block_svc_routines.c+4367 :<block_create_common>]
[2018-07-23 14:07:24.338279] INFO: create request, volume=vol_8317f50bf66dd1bf02a1d7de68ee280a blockname=blk_fiotest_pvc1e8wh7cfxu_b854ffcb-8e81-11e8-888e-0a580a810202 blockhosts=10.70.46.1,10.70.46.175,10.70.46.75 filename=3286f8c1-7145-414e-877a-48bda632f606 authmode=1 passwd=b3bb5f70-2acd-42d7-be38-5dfc6c79bb30 size=1073741824 [at block_svc_routines.c+4191 :<block_create_common>]
[2018-07-23 14:07:27.711834] INFO: command exit code, 0 [at block_svc_routines.c+4367 :<block_create_common>]
^C


Some outputs
++++++++++++
[root@dhcp47-178 ~]# oc get pvc --all-namespaces -w|grep -i pending ; date
fiotest     pvc1e8wh7cfxu   Pending                                                                        gluster-block   20m
fiotest     pvceksh62yahg   Pending                                                                        gluster-block   22m
fiotest     pvciilomtvr60   Pending                                                                        gluster-block   22m
fiotest     pvcjqxs3jx072   Pending                                                                        gluster-block   21m
fiotest     pvckpuvgytjei   Pending                                                                        gluster-block   21m
fiotest     pvcmwjuvdojc0   Pending                                                                        gluster-block   21m
fiotest     pvcrr5keo41js   Pending                                                                        gluster-block   22m
fiotest   pvc1e8wh7cfxu   Pending                                 gluster-block   21m
fiotest   pvcrr5keo41js   Pending   pvc-788e226d-8e7e-11e8-863e-005056a5aac9   0                   gluster-block   23m
fiotest   pvckpuvgytjei   Pending                                 gluster-block   22m
fiotest   pvciilomtvr60   Pending   pvc-600abde1-8e7e-11e8-863e-005056a5aac9   0                   gluster-block   23m
fiotest   pvc1e8wh7cfxu   Pending   pvc-a93f5d13-8e7e-11e8-863e-005056a5aac9   0                   gluster-block   22m
fiotest   pvcjqxs3jx072   Pending                                 gluster-block   22m


----------------------------

PVC counts when the request for 7 pvcs havents yet reached heketi
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

[root@dhcp47-178 ~]# heketi-cli blockvolume list|wc -l ; date
84
Mon Jul 23 19:19:06 IST 2018
----------------------------

sh-4.2# for i in `gluster volume list|grep vol`; do echo $i; echo +++++++++++++++++;  gluster-block list $i |wc -l; echo "";  done
84
sh-4.2# date
Mon Jul 23 13:49:22 UTC 2018

----------------------------

[root@dhcp47-178 100pods]# oc get pv --all-namespaces|wc -l; date
85
Mon Jul 23 19:33:45 IST 2018
----------------------------

[root@dhcp47-178 100pods]# oc get pvc --all-namespaces|wc -l; date
92
Mon Jul 23 19:33:52 IST 2018

----------------------------

Note: Finally all the volumes are created successfully and are in Bound state. But the reason for no POST requests reaching heketi for 20 mins is unknown.

Current volume count - once all pending pvcs are bound
++++++++++++++++++++

[root@dhcp47-178 ~]# heketi-cli blockvolume list|wc -l && date
91
Mon Jul 23 20:51:41 IST 2018
[root@dhcp47-178 ~]# 


[root@dhcp47-178 ~]# oc rsh glusterfs-storage-fj674
sh-4.2# gluster v list
heketidbstorage
vol_8317f50bf66dd1bf02a1d7de68ee280a


sh-4.2# gluster-block list vol_8317f50bf66dd1bf02a1d7de68ee280a |wc -l && date
91
Mon Jul 23 15:21:38 UTC 2018
sh-4.2# 

[root@dhcp47-178 100pods]# oc get pvc --all-namespaces|grep -v NAME|wc -l && date
91
Mon Jul 23 20:52:03 IST 2018

[root@dhcp47-178 100pods]# oc get pv --all-namespaces|grep -v NAME|wc -l && date
91
Mon Jul 23 20:52:08 IST 2018
[root@dhcp47-178 100pods]# 



Version-Release number of selected component (if applicable):
++++++++++++++++++++++++++
[root@dhcp47-178 ~]# oc version
oc v3.10.18
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dhcp47-178.lab.eng.blr.redhat.com:8443
openshift v3.10.18
kubernetes v1.10.0+b81c8f8

[root@dhcp47-178 ~]# oc rsh glusterfs-storage-
glusterfs-storage-4ffb2  glusterfs-storage-9bjx9  glusterfs-storage-fj674  
[root@dhcp47-178 ~]# oc rsh glusterfs-storage-4ffb2 rpm -qa|grep gluster
glusterfs-libs-3.8.4-54.15.el7rhgs.x86_64
glusterfs-3.8.4-54.15.el7rhgs.x86_64
glusterfs-api-3.8.4-54.15.el7rhgs.x86_64
glusterfs-cli-3.8.4-54.15.el7rhgs.x86_64
glusterfs-server-3.8.4-54.15.el7rhgs.x86_64
gluster-block-0.2.1-22.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-54.15.el7rhgs.x86_64
glusterfs-fuse-3.8.4-54.15.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-54.15.el7rhgs.x86_64

[root@dhcp47-178 ~]# oc rsh heketi-storage-1-6st7z rpm -qa|grep heketi
python-heketi-7.0.0-4.el7rhgs.x86_64
heketi-client-7.0.0-4.el7rhgs.x86_64
heketi-7.0.0-4.el7rhgs.x86_64
[root@dhcp47-178 ~]# 



How reproducible:
++++++++++++++++++++++++++
2x2

Steps to Reproduce:
++++++++++++++++++++++++++
1. In a 3node CNS cluster, start a loop/script to create pvcs+app pods(with those pvcs attached) in batches of 10
2. Check for the pvc status in each batch and confirm that all pvc create requests are reaching heketi within some seconds
3. In the loop for creating pvcs 80th to 90th, check for pvc status and heketi logs 
  (i ) it is seen that even though pvc create commands are executed for 84th to 90th pvc, the 7 pvcs stayed in pending state for almost 20 mins.
  (ii) The request to create the block devices reached heketi after 20 mins, and then the pvcs were created successfully.
  (iii) For almost 20 mins, the pvc count was 91 but pv count was 84 as 7 pvcs stayed in pending state.



Actual results:
++++++++++++++++++++++++++
In the loop for creating pvcs 80th to 90th, check for pvc status and heketi logs 
  (i ) it is seen that even though pvc create commands are executed for 84th to 90th pvc, the 7 pvcs stayed in pending state for almost 20 mins.
  (ii) The request to create the block devices reached heketi after 20 mins, and then the pvcs were created successfully.
  (iii) For almost 20 mins, the pvc count was 91 but pv count was 84 as 7 pvcs stayed in pending state.


Expected results:
++++++++++++++++++++++++++
Requests for pvc creations for #1 - 83 reached heketi in no time and they came in Bound state within some seconds.
Similarly, even for the 84th to 90th pvcs, there should not have been a time lag of almost 20 mins for pvcs to actually get created and come in Bound state.

Comment 16 Humble Chirammal 2018-08-01 14:11:37 UTC
Fixed in version : 	rhgs-gluster-block-prov-container-3.3.1-23

Comment 19 Humble Chirammal 2018-08-24 13:53:22 UTC
Today's build supposed to have an improvement in this code path 

brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-gluster-block-prov-rhel7:3.4.0-2

Please find the result with above image. I am moving the bug to ON_QA

Comment 34 John Mulligan 2019-05-16 12:16:43 UTC
*** Bug 1668629 has been marked as a duplicate of this bug. ***

Comment 42 Humble Chirammal 2019-08-01 10:16:58 UTC
*** Bug 1700493 has been marked as a duplicate of this bug. ***

Comment 46 errata-xmlrpc 2019-10-30 12:39:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3260


Note You need to log in before you can comment on or make changes to this bug.