Bug 1607551 - New block pvc creations of 1 GB failing when available free size in block-hosting volume=1GB
Summary: New block pvc creations of 1 GB failing when available free size in block-hos...
Keywords:
Status: CLOSED DUPLICATE of bug 1596018
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: cns-3.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Michael Adam
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks: 1568862
TreeView+ depends on / blocked
 
Reported: 2018-07-23 18:07 UTC by Neha Berry
Modified: 2018-07-26 11:58 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-26 11:58:18 UTC
Embargoed:


Attachments (Terms of Use)

Description Neha Berry 2018-07-23 18:07:49 UTC
Description of problem:
++++++++++++++++++++++++++

We were scaling up to create atleast 100 APP pods with block devices bind-mounted to them.
We were creating each pvc+pod after in intervals of 10 s. Each PVC was of 1 GB each. T

While creating 90th to 100th pvc in the namespace "fiotest", new block device creations failed with "No space left on device". pvc describe listed error messages as following:

"Failed to provision volume with StorageClass "gluster-block": failed to create volume: [heketi] failed to create volume: Unable to execute command on glusterfs-storage-fj674:"




Note: 
++++++++

1. The system had one block-hosting volume - 8317f50bf66dd1bf02a1d7de68ee280a which now only has 1 GB free. Hence, for subsequent block device creations, a new block hosting volume should have been created automatically. It should not have failed with "[No space left on device]" }". 

Could it be that the free size in 8317f50bf66dd1bf02a1d7de68ee280a = the new pvc requested size of 1GB, hence somehow the creations failed.

2. The pvc requests were also of 1GB each. The available space in 8317f50bf66dd1bf02a1d7de68ee280a was 7GB before we started creating 10 new devices in an interval of 10 s. 

6 pvcs got created successfully and subsequent 4 failed. Current available size in 8317f50bf66dd1bf02a1d7de68ee280a = 1GB.


1. Error message from heketi
++++++++++++++++++++++++++++

[kubeexec] ERROR 2018/07/23 16:27:13 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [gluster-block create vol_8317f50bf66dd1bf02a1d7de68ee280a/blk_fiotest_pvcam3jjv0vwr_401e17c6-8e95-11e8-888e-0a580a810202  ha 3 auth enable prealloc full 10.70.46.1,10.70.46.175,10.70.46.75 1GiB --json] on glusterfs-storage-fj674: Err[command terminated with exit code 28]: Stdout [{ "RESULT": "FAIL", "errCode": 28, "errMsg": "Not able to create storage for vol_8317f50bf66dd1bf02a1d7de68ee280a\/blk_fiotest_pvcam3jjv0vwr_401e17c6-8e95-11e8-888e-0a580a810202 [No space left on device]" }
]: Stderr []
[kubeexec] ERROR 2018/07/23 16:27:13 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [gluster-block delete vol_8317f50bf66dd1bf02a1d7de68ee280a/blk_fiotest_pvcam3jjv0vwr_401e17c6-8e95-11e8-888e-0a580a810202 --json] on glusterfs-storage-fj674: Err[command terminated with exit code 2]: Stdout [{ "RESULT": "FAIL", "errCode": 2, "errMsg": "block vol_8317f50bf66dd1bf02a1d7de68ee280a\/blk_fiotest_pvcam3jjv0vwr_401e17c6-8e95-11e8-888e-0a580a810202 doesn't exist" }
]: Stderr []
[cmdexec] ERROR 2018/07/23 16:27:13 /src/github.com/heketi/heketi/executors/cmdexec/block_volume.go:102: Unable to delete volume blk_fiotest_pvcam3jjv0vwr_401e17c6-8e95-11e8-888e-0a580a810202: Unable to execute command on glusterfs-storage-fj674:
[heketi] ERROR 2018/07/23 16:27:13 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:816: Error executing create block volume: Unable to execute command on glusterfs-storage-fj674:
[cmdexec] INFO 2018/07/23 16:27:13 Check Glusterd service status in node dhcp46-1.lab.eng.blr.redhat.com


2. Error message from gluster pod of glusterfs-storage-fj674
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

175,10.70.46.75 authmode=1 size=1073741824, rbsize=0 [at block_svc_routines.c+3778 :<block_create_cli_1_svc_st>]
[2018-07-23 16:27:13.373883] ERROR: failed while creating block file in gluster volume volume: vol_8317f50bf66dd1bf02a1d7de68ee280a block: blk_fiotest_pvcam3jjv0vwr_401e17c6-8e95-11e8-888e-0a580a810202 file: 5354b6f8-038f-4a2e-8e30-d5d5fd6e684c host: 10.70.46.1,10.70.46.175,10.70.46.75 [at block_svc_routines.c+3868 :<block_create_cli_1_svc_st>]
[2018-07-23 16:27:13.518944] INFO: delete cli request, volume=vol_8317f50bf66dd1bf02a1d7de68ee280a blockname=blk_fiotest_pvcam3jjv0vwr_401e17c6-8e95-11e8-888e-0a580a810202 [at block_svc_routines.c+4493 :<block_delete_cli_1_svc_st>]
[2018-07-23 16:27:13.523342] ERROR: block with name blk_fiotest_pvcam3jjv0vwr_401e17c6-8e95-11e8-888e-0a580a810202 doesn't exist in the volume vol_8317f50bf66dd1bf02a1d7de68ee280a [at block_svc_routines.c+4528 :<block_delete_cli_1_svc_st>]
[2018-07-23 16:27:13.812625] INFO: delete cli request, volume=vol_8317f50bf66dd1bf02a1d7de68ee280a blockname=blk_fiotest_pvcam3jjv0vwr_401e17c6-8e95-11e8-888e-0a580a810202 [at block_svc_routines.c+4493 :<block_delete_cli_1_svc_st>]


These create requests didnt even reach the other glusterfs pods

3. Error message from oc describe pvc
+++++++++++++++++++++++++++++++++++++

[root@dhcp47-178 openshift_scalability]# for i in `oc get pvc -n fiotest|grep -i pending |awk '{print$1}' `; do echo $i; echo +++++++++++++; oc describe pvc $i -n fiotest; echo ""; done
pvc8otjs11sbc
+++++++++++++
Name:          pvc8otjs11sbc
Namespace:     fiotest
StorageClass:  gluster-block
Status:        Pending
Volume:        
Labels:        <none>
Annotations:   control-plane.alpha.kubernetes.io/leader={"holderIdentity":"5b1157a2-8e51-11e8-888e-0a580a810202","leaseDurationSeconds":15,"acquireTime":"2018-07-23T16:28:13Z","renewTime":"2018-07-23T16:45:27Z","lea...
               volume.beta.kubernetes.io/storage-class=gluster-block
               volume.beta.kubernetes.io/storage-provisioner=gluster.org/glusterblock
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
Events:
  Type     Reason                Age                 From                                                           Message
  ----     ------                ----                ----                                                           -------
  Normal   Provisioning          38m (x14 over 53m)  gluster.org/glusterblock 5b1157a2-8e51-11e8-888e-0a580a810202  External provisioner is provisioning volume for claim "fiotest/pvc8otjs11sbc"
  Warning  ProvisioningFailed    38m (x14 over 53m)  gluster.org/glusterblock 5b1157a2-8e51-11e8-888e-0a580a810202  Failed to provision volume with StorageClass "gluster-block": failed to create volume: [heketi] failed to create volume: Unable to execute command on glusterfs-storage-fj674:
  Normal   ExternalProvisioning  3m (x461 over 53m)  persistentvolume-controller                                    waiting for a volume to be created, either by external provisioner "gluster.org/glusterblock" or manually created by system administrator


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Note: All services were up and running in the 3 glusterfs pods


4. Loop used to create pvc volumes #90 to #100 under fiotest
---------------------------------------------------------------

[root@dhcp47-178 openshift_scalability]# python cluster-loader.py -f content/fio/fio-parameters.yaml && date
oc v3.10.18
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dhcp47-178.lab.eng.blr.redhat.com:8443
openshift v3.10.18
kubernetes v1.10.0+b81c8f8

forking fiotest
project.project.openshift.io/fiotest

templates:  [{'num': 10, 'file': './content/fio/fio-template.json', 'parameters': [{'STORAGE_CLASS': 'gluster-block'}, {'STORAGE_SIZE': '1Gi'}, {'MOUNT_PATH': '/mnt/pvcmount'}, {'DOCKER_IMAGE': 'r7perffio'}]}]
persistentvolumeclaim "pvct06rlmfdbe" created
pod "fio-pod-tshdz" created

persistentvolumeclaim "pvcnniamy8cel" created   <----successfully created 
pod "fio-pod-brrl4" created

persistentvolumeclaim "pvcg50nakvc1l" created    <----successfully created
pod "fio-pod-rkjkr" created

persistentvolumeclaim "pvcwuafokgfam" created     <----successfully created
pod "fio-pod-jxddb" created

persistentvolumeclaim "pvcae5cmlfiad" created     <----successfully created
pod "fio-pod-gqbth" created

persistentvolumeclaim "pvcl4lylp2rvh" created     <----successfully created
pod "fio-pod-dh4w4" created

persistentvolumeclaim "pvcam3jjv0vwr" created     <----- no backend block device created
pod "fio-pod-q59dr" created

persistentvolumeclaim "pvcvqw113p5na" created     <----- no backend block device created
pod "fio-pod-bc4p9" created

persistentvolumeclaim "pvcbggembg3pu" created     <----- no backend block device created
pod "fio-pod-rsjxc" created

persistentvolumeclaim "pvc8otjs11sbc" created     <----- no backend block device created
pod "fio-pod-zfr7r" created

Mon Jul 23 21:58:33 IST 2018


------------------------------------------------------------------------

**********************************************************************



Version-Release number of selected component (if applicable):
++++++++++++++++++++++++++
[root@dhcp47-178 ~]# oc version
oc v3.10.18
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dhcp47-178.lab.eng.blr.redhat.com:8443
openshift v3.10.18
kubernetes v1.10.0+b81c8f8

[root@dhcp47-178 ~]# oc rsh glusterfs-storage-
glusterfs-storage-4ffb2  glusterfs-storage-9bjx9  glusterfs-storage-fj674  
[root@dhcp47-178 ~]# oc rsh glusterfs-storage-4ffb2 rpm -qa|grep gluster
glusterfs-libs-3.8.4-54.15.el7rhgs.x86_64
glusterfs-3.8.4-54.15.el7rhgs.x86_64
glusterfs-api-3.8.4-54.15.el7rhgs.x86_64
glusterfs-cli-3.8.4-54.15.el7rhgs.x86_64
glusterfs-server-3.8.4-54.15.el7rhgs.x86_64
gluster-block-0.2.1-22.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-54.15.el7rhgs.x86_64
glusterfs-fuse-3.8.4-54.15.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-54.15.el7rhgs.x86_64

[root@dhcp47-178 ~]# oc rsh heketi-storage-1-6st7z rpm -qa|grep heketi
python-heketi-7.0.0-4.el7rhgs.x86_64
heketi-client-7.0.0-4.el7rhgs.x86_64
heketi-7.0.0-4.el7rhgs.x86_64
[root@dhcp47-178 ~]# 



How reproducible:
++++++++++++++++++++++++++
Once

Steps to Reproduce:
++++++++++++++++++++++++++
1. Start a script to create pvcs of 1GB when the free size of the lone block-hosting volume present is also 1 GB.
2. Confirm that a new block hosting volume is created and 1 GB pvcs are carved out of it.

Actual results:
++++++++++++++++++++++++++
With only 1 GB free in the lone block-hosting volume, new block device of 1 GB could not be created in the setup [No space left on device]. 
A new block hosting volume is also not created to provide space for new block devices.

Expected results:
++++++++++++++++++++++++++
With less space available in the block-hosting volume, a new block hosting volume should get created to satify subsequent block device create requests.

Seems like we hit this issue because the free size in vol_8317f50bf66dd1bf02a1d7de68ee280a(100GB in size) is 1GB and new request for pvc is also 1GB. Instead of creating a new block hosting volume, it tried to provision the 1GB-pvc from existing volume and failed.


Note You need to log in before you can comment on or make changes to this bug.