Bug 1619017 - mismatch in number of block devices between heketi and gluster
Summary: mismatch in number of block devices between heketi and gluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: heketi
Version: cns-3.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: CNS 3.10
Assignee: John Mulligan
QA Contact: Nitin Goyal
URL:
Whiteboard:
Depends On:
Blocks: 1568862
TreeView+ depends on / blocked
 
Reported: 2018-08-19 14:59 UTC by krishnaram Karthick
Modified: 2018-11-16 07:46 UTC (History)
11 users (show)

Fixed In Version: heketi-7.0.0-7.el7rhgs
Doc Type: Bug Fix
Doc Text:
Previously, the storage space used by the block volume was not deducted from the block hosting volume until the block volume was created. If two block volumes were requested at the same time then Heketi would allow both volume requests to pass to the underlying storage system. This lead to unnecessary out of space errors. With this fix, Heketi reserves the space needed for the block volume prior to requesting the block volume from the underlying storage system. Heketi no longer requests for more than one block volume if the free space on the block hosting volume can contain only one volume.
Clone Of:
Environment:
Last Closed: 2018-09-12 09:23:51 UTC
Target Upstream Version:


Attachments (Terms of Use)
heketi_logs (6.19 MB, text/plain)
2018-08-19 17:09 UTC, krishnaram Karthick
no flags Details
volume_information (19.56 KB, application/x-gzip)
2018-08-19 17:11 UTC, krishnaram Karthick
no flags Details
db_dump (153.33 KB, text/plain)
2018-08-22 11:56 UTC, krishnaram Karthick
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:2686 0 None None None 2018-09-12 09:24:40 UTC

Description krishnaram Karthick 2018-08-19 14:59:33 UTC
Description of problem:
When 300 block PVCs are created in a for loop in a CNS system, a lot of block devices are created with 0 GB size with no actual values set. This leads to a mismatch in the volume count between heketi and the actual gluster-block device list.

Out of the 3 block hosting volumes available, this issue is seen only on 'vol_3589da219d6536edb00cf7b533976e25' block hosting volume. 

NAME: blockvol_500485782ebda0fdffa05fbea3c53da2
VOLUME: 
GBID: 
SIZE: 0.0 B
HA: 0
PASSWORD: 
EXPORTED ON:
NAME: blockvol_07d566097b87535f30d6c37dde268780
VOLUME: 
GBID: 
SIZE: 0.0 B
HA: 0
PASSWORD: 
EXPORTED ON:

[kubeexec] ERROR 2018/08/19 14:17:47 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [gluster-block create vol_3589da219d6536edb00cf7b533976e25/blockvol_500485782ebda0fdff
a05fbea3c53da2  ha 3 auth enable prealloc full 10.70.46.152,10.70.47.54,10.70.47.183 1GiB --json] on glusterfs-storage-nr58s: Err[command terminated with exit code 255]: Stdout [{ "RESULT": "FAIL", "errCode": 25
5, "errMsg": "Failed to update transaction log for vol_3589da219d6536edb00cf7b533976e25\/blockvol_500485782ebda0fdffa05fbea3c53da2[No space left on device]" }

This could posibly due to heketi not having the right data of free space and available space.

Version-Release number of selected component (if applicable):
gluster-block-0.2.1-24.el7rhgs.x86_64
heketi-7.0.0-6.el7rhgs.x86_64

How reproducible:
1/1 - Tried only once

Steps to Reproduce:
1. create 300 block hosting devices in a for loop - for i in {1..300}; do oc new-app mongodb-persistent-template.json --param=DATABASE_SERVICE_NAME=mongodb-block-$i --param=VOLUME_CAPACITY=1Gi; done


Actual results:
Lot of block devices with 0GB created, mismatch in heketi & block device count

Expected results:
no mismatch & clean up of block devices on failure should be taken care

Additional info:

Comment 2 krishnaram Karthick 2018-08-19 17:09:17 UTC
Created attachment 1476922 [details]
heketi_logs

Comment 3 krishnaram Karthick 2018-08-19 17:11:59 UTC
Created attachment 1476923 [details]
volume_information

Comment 7 krishnaram Karthick 2018-08-22 11:56:36 UTC
Created attachment 1477857 [details]
db_dump

Comment 9 Michael Adam 2018-08-22 20:36:12 UTC
ACK, this is a blocker. we need to fix it.

This is not a regression by the recent fixes.
This is an older bug. According to John, who will provide details,
it's heketi updating the free space calculation too late.

Comment 10 John Mulligan 2018-08-22 21:42:24 UTC
For the block hosting volume vol_3589da219d6536edb00cf7b533976e25:
  we have the following within the heketi db:
    "freesize": 1,
    "reservedsize": 2,
    block-volume count = 97
    ~~~
  
  we find that on a gluster pod:
    (trimmed df output)
    10.70.47.183:vol_3589da219d6536edb00cf7b533976e25  100G  100G 0 100%

    # gluster-block list vol_3589da219d6536edb00cf7b533976e25 | grep blockvol_ | wc -l
    17879

    # ls -lh /var/lib/heketi/mounts/vg_f89b9b3b7340e500f2c6367273182b28/brick_73ccc3351fe2b8705b443f0bbe3ed284/brick/block-meta/  | wc -l
    17956

So not only is heketi allowing more block volumes than it should, gluster-block has vastly more volumes than heketi knows about.

I've identified two flaws in the implemenation of block volume create in heketi that could lead to heketi trying to create more block volumes than it should, however I'm not sure this many volumes could have been created at the gluster-block level.

If I find anything more I'll update this bz.

Comment 16 Nitin Goyal 2018-08-29 02:27:54 UTC
I verified this bug on below container images ->

rhgs-server-rhel7               3.4.0-4
rhgs-volmanager-rhel7           3.4.0-4
rhgs-gluster-block-prov-rhel7   3.4.0-3


I was able to create 300 block devices without any issues and there were only 4 block pvcs were there which is expected.

[root@dhcp47-105 ~]# heketi-cli  volume list
Id:4f414916b2b96ac003fff140b087968b    Cluster:6f9b495a4068d35a4ab4df60fd94d723    Name:vol_4f414916b2b96ac003fff140b087968b [block]
Id:50f65b5bc0ea95f00c84e08ce696e859    Cluster:6f9b495a4068d35a4ab4df60fd94d723    Name:vol_50f65b5bc0ea95f00c84e08ce696e859 [block]
Id:cdba045d63f6ca47eb902b7af5fb7d5a    Cluster:6f9b495a4068d35a4ab4df60fd94d723    Name:vol_cdba045d63f6ca47eb902b7af5fb7d5a [block]
Id:d65773e9cb116ff1fd982bd3a465a0c4    Cluster:6f9b495a4068d35a4ab4df60fd94d723    Name:heketidbstorage
Id:f1e0e46685cff4d8287cb2f662633495    Cluster:6f9b495a4068d35a4ab4df60fd94d723    Name:vol_f1e0e46685cff4d8287cb2f662633495 [block]


[root@dhcp47-105 ~]# heketi-cli blockvolume list | wc -l
303
[root@dhcp47-105 ~]# oc get pvc | grep mongodb-block | wc -l
300
[root@dhcp47-105 ~]# oc get pvc | grep mongodb-block | grep Bound | wc -l
300

Hence marking this as verified.

Comment 17 Anjana KD 2018-09-07 08:52:44 UTC
have updated the doc text, Kindly review.

Comment 18 John Mulligan 2018-09-07 16:43:23 UTC
Doc Text looks OK

Comment 20 errata-xmlrpc 2018-09-12 09:23:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2686


Note You need to log in before you can comment on or make changes to this bug.