Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1619017 - mismatch in number of block devices between heketi and gluster
mismatch in number of block devices between heketi and gluster
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: heketi (Show other bugs)
cns-3.10
Unspecified Unspecified
unspecified Severity high
: ---
: CNS 3.10
Assigned To: John Mulligan
Nitin Goyal
:
Depends On:
Blocks: 1568862
  Show dependency treegraph
 
Reported: 2018-08-19 10:59 EDT by krishnaram Karthick
Modified: 2018-09-12 05:24 EDT (History)
10 users (show)

See Also:
Fixed In Version: heketi-7.0.0-7.el7rhgs
Doc Type: Bug Fix
Doc Text:
Previously, the storage space used by the block volume was not deducted from the block hosting volume until the block volume was created. If two block volumes were requested at the same time then Heketi would allow both volume requests to pass to the underlying storage system. This lead to unnecessary out of space errors. With this fix, Heketi reserves the space needed for the block volume prior to requesting the block volume from the underlying storage system. Heketi no longer requests for more than one block volume if the free space on the block hosting volume can contain only one volume.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-09-12 05:23:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
heketi_logs (6.19 MB, text/plain)
2018-08-19 13:09 EDT, krishnaram Karthick
no flags Details
volume_information (19.56 KB, application/x-gzip)
2018-08-19 13:11 EDT, krishnaram Karthick
no flags Details
db_dump (153.33 KB, text/plain)
2018-08-22 07:56 EDT, krishnaram Karthick
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:2686 None None None 2018-09-12 05:24 EDT

  None (edit)
Description krishnaram Karthick 2018-08-19 10:59:33 EDT
Description of problem:
When 300 block PVCs are created in a for loop in a CNS system, a lot of block devices are created with 0 GB size with no actual values set. This leads to a mismatch in the volume count between heketi and the actual gluster-block device list.

Out of the 3 block hosting volumes available, this issue is seen only on 'vol_3589da219d6536edb00cf7b533976e25' block hosting volume. 

NAME: blockvol_500485782ebda0fdffa05fbea3c53da2
VOLUME: 
GBID: 
SIZE: 0.0 B
HA: 0
PASSWORD: 
EXPORTED ON:
NAME: blockvol_07d566097b87535f30d6c37dde268780
VOLUME: 
GBID: 
SIZE: 0.0 B
HA: 0
PASSWORD: 
EXPORTED ON:

[kubeexec] ERROR 2018/08/19 14:17:47 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [gluster-block create vol_3589da219d6536edb00cf7b533976e25/blockvol_500485782ebda0fdff
a05fbea3c53da2  ha 3 auth enable prealloc full 10.70.46.152,10.70.47.54,10.70.47.183 1GiB --json] on glusterfs-storage-nr58s: Err[command terminated with exit code 255]: Stdout [{ "RESULT": "FAIL", "errCode": 25
5, "errMsg": "Failed to update transaction log for vol_3589da219d6536edb00cf7b533976e25\/blockvol_500485782ebda0fdffa05fbea3c53da2[No space left on device]" }

This could posibly due to heketi not having the right data of free space and available space.

Version-Release number of selected component (if applicable):
gluster-block-0.2.1-24.el7rhgs.x86_64
heketi-7.0.0-6.el7rhgs.x86_64

How reproducible:
1/1 - Tried only once

Steps to Reproduce:
1. create 300 block hosting devices in a for loop - for i in {1..300}; do oc new-app mongodb-persistent-template.json --param=DATABASE_SERVICE_NAME=mongodb-block-$i --param=VOLUME_CAPACITY=1Gi; done


Actual results:
Lot of block devices with 0GB created, mismatch in heketi & block device count

Expected results:
no mismatch & clean up of block devices on failure should be taken care

Additional info:
Comment 2 krishnaram Karthick 2018-08-19 13:09 EDT
Created attachment 1476922 [details]
heketi_logs
Comment 3 krishnaram Karthick 2018-08-19 13:11 EDT
Created attachment 1476923 [details]
volume_information
Comment 7 krishnaram Karthick 2018-08-22 07:56 EDT
Created attachment 1477857 [details]
db_dump
Comment 9 Michael Adam 2018-08-22 16:36:12 EDT
ACK, this is a blocker. we need to fix it.

This is not a regression by the recent fixes.
This is an older bug. According to John, who will provide details,
it's heketi updating the free space calculation too late.
Comment 10 John Mulligan 2018-08-22 17:42:24 EDT
For the block hosting volume vol_3589da219d6536edb00cf7b533976e25:
  we have the following within the heketi db:
    "freesize": 1,
    "reservedsize": 2,
    block-volume count = 97
    ~~~
  
  we find that on a gluster pod:
    (trimmed df output)
    10.70.47.183:vol_3589da219d6536edb00cf7b533976e25  100G  100G 0 100%

    # gluster-block list vol_3589da219d6536edb00cf7b533976e25 | grep blockvol_ | wc -l
    17879

    # ls -lh /var/lib/heketi/mounts/vg_f89b9b3b7340e500f2c6367273182b28/brick_73ccc3351fe2b8705b443f0bbe3ed284/brick/block-meta/  | wc -l
    17956

So not only is heketi allowing more block volumes than it should, gluster-block has vastly more volumes than heketi knows about.

I've identified two flaws in the implemenation of block volume create in heketi that could lead to heketi trying to create more block volumes than it should, however I'm not sure this many volumes could have been created at the gluster-block level.

If I find anything more I'll update this bz.
Comment 16 Nitin Goyal 2018-08-28 22:27:54 EDT
I verified this bug on below container images ->

rhgs-server-rhel7               3.4.0-4
rhgs-volmanager-rhel7           3.4.0-4
rhgs-gluster-block-prov-rhel7   3.4.0-3


I was able to create 300 block devices without any issues and there were only 4 block pvcs were there which is expected.

[root@dhcp47-105 ~]# heketi-cli  volume list
Id:4f414916b2b96ac003fff140b087968b    Cluster:6f9b495a4068d35a4ab4df60fd94d723    Name:vol_4f414916b2b96ac003fff140b087968b [block]
Id:50f65b5bc0ea95f00c84e08ce696e859    Cluster:6f9b495a4068d35a4ab4df60fd94d723    Name:vol_50f65b5bc0ea95f00c84e08ce696e859 [block]
Id:cdba045d63f6ca47eb902b7af5fb7d5a    Cluster:6f9b495a4068d35a4ab4df60fd94d723    Name:vol_cdba045d63f6ca47eb902b7af5fb7d5a [block]
Id:d65773e9cb116ff1fd982bd3a465a0c4    Cluster:6f9b495a4068d35a4ab4df60fd94d723    Name:heketidbstorage
Id:f1e0e46685cff4d8287cb2f662633495    Cluster:6f9b495a4068d35a4ab4df60fd94d723    Name:vol_f1e0e46685cff4d8287cb2f662633495 [block]


[root@dhcp47-105 ~]# heketi-cli blockvolume list | wc -l
303
[root@dhcp47-105 ~]# oc get pvc | grep mongodb-block | wc -l
300
[root@dhcp47-105 ~]# oc get pvc | grep mongodb-block | grep Bound | wc -l
300

Hence marking this as verified.
Comment 17 Anjana 2018-09-07 04:52:44 EDT
have updated the doc text, Kindly review.
Comment 18 John Mulligan 2018-09-07 12:43:23 EDT
Doc Text looks OK
Comment 20 errata-xmlrpc 2018-09-12 05:23:51 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2686

Note You need to log in before you can comment on or make changes to this bug.