Description of problem: When 300 block PVCs are created in a for loop in a CNS system, a lot of block devices are created with 0 GB size with no actual values set. This leads to a mismatch in the volume count between heketi and the actual gluster-block device list. Out of the 3 block hosting volumes available, this issue is seen only on 'vol_3589da219d6536edb00cf7b533976e25' block hosting volume. NAME: blockvol_500485782ebda0fdffa05fbea3c53da2 VOLUME: GBID: SIZE: 0.0 B HA: 0 PASSWORD: EXPORTED ON: NAME: blockvol_07d566097b87535f30d6c37dde268780 VOLUME: GBID: SIZE: 0.0 B HA: 0 PASSWORD: EXPORTED ON: [kubeexec] ERROR 2018/08/19 14:17:47 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [gluster-block create vol_3589da219d6536edb00cf7b533976e25/blockvol_500485782ebda0fdff a05fbea3c53da2 ha 3 auth enable prealloc full 10.70.46.152,10.70.47.54,10.70.47.183 1GiB --json] on glusterfs-storage-nr58s: Err[command terminated with exit code 255]: Stdout [{ "RESULT": "FAIL", "errCode": 25 5, "errMsg": "Failed to update transaction log for vol_3589da219d6536edb00cf7b533976e25\/blockvol_500485782ebda0fdffa05fbea3c53da2[No space left on device]" } This could posibly due to heketi not having the right data of free space and available space. Version-Release number of selected component (if applicable): gluster-block-0.2.1-24.el7rhgs.x86_64 heketi-7.0.0-6.el7rhgs.x86_64 How reproducible: 1/1 - Tried only once Steps to Reproduce: 1. create 300 block hosting devices in a for loop - for i in {1..300}; do oc new-app mongodb-persistent-template.json --param=DATABASE_SERVICE_NAME=mongodb-block-$i --param=VOLUME_CAPACITY=1Gi; done Actual results: Lot of block devices with 0GB created, mismatch in heketi & block device count Expected results: no mismatch & clean up of block devices on failure should be taken care Additional info:
Created attachment 1476922 [details] heketi_logs
Created attachment 1476923 [details] volume_information
Created attachment 1477857 [details] db_dump
ACK, this is a blocker. we need to fix it. This is not a regression by the recent fixes. This is an older bug. According to John, who will provide details, it's heketi updating the free space calculation too late.
For the block hosting volume vol_3589da219d6536edb00cf7b533976e25: we have the following within the heketi db: "freesize": 1, "reservedsize": 2, block-volume count = 97 ~~~ we find that on a gluster pod: (trimmed df output) 10.70.47.183:vol_3589da219d6536edb00cf7b533976e25 100G 100G 0 100% # gluster-block list vol_3589da219d6536edb00cf7b533976e25 | grep blockvol_ | wc -l 17879 # ls -lh /var/lib/heketi/mounts/vg_f89b9b3b7340e500f2c6367273182b28/brick_73ccc3351fe2b8705b443f0bbe3ed284/brick/block-meta/ | wc -l 17956 So not only is heketi allowing more block volumes than it should, gluster-block has vastly more volumes than heketi knows about. I've identified two flaws in the implemenation of block volume create in heketi that could lead to heketi trying to create more block volumes than it should, however I'm not sure this many volumes could have been created at the gluster-block level. If I find anything more I'll update this bz.
I verified this bug on below container images -> rhgs-server-rhel7 3.4.0-4 rhgs-volmanager-rhel7 3.4.0-4 rhgs-gluster-block-prov-rhel7 3.4.0-3 I was able to create 300 block devices without any issues and there were only 4 block pvcs were there which is expected. [root@dhcp47-105 ~]# heketi-cli volume list Id:4f414916b2b96ac003fff140b087968b Cluster:6f9b495a4068d35a4ab4df60fd94d723 Name:vol_4f414916b2b96ac003fff140b087968b [block] Id:50f65b5bc0ea95f00c84e08ce696e859 Cluster:6f9b495a4068d35a4ab4df60fd94d723 Name:vol_50f65b5bc0ea95f00c84e08ce696e859 [block] Id:cdba045d63f6ca47eb902b7af5fb7d5a Cluster:6f9b495a4068d35a4ab4df60fd94d723 Name:vol_cdba045d63f6ca47eb902b7af5fb7d5a [block] Id:d65773e9cb116ff1fd982bd3a465a0c4 Cluster:6f9b495a4068d35a4ab4df60fd94d723 Name:heketidbstorage Id:f1e0e46685cff4d8287cb2f662633495 Cluster:6f9b495a4068d35a4ab4df60fd94d723 Name:vol_f1e0e46685cff4d8287cb2f662633495 [block] [root@dhcp47-105 ~]# heketi-cli blockvolume list | wc -l 303 [root@dhcp47-105 ~]# oc get pvc | grep mongodb-block | wc -l 300 [root@dhcp47-105 ~]# oc get pvc | grep mongodb-block | grep Bound | wc -l 300 Hence marking this as verified.
have updated the doc text, Kindly review.
Doc Text looks OK
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2686