Bug 1612049 - Once BZ#1584123 is hit in older heketi-build-6.0.0.7-4, block volume creations failed in latest builds with heketi panic
Summary: Once BZ#1584123 is hit in older heketi-build-6.0.0.7-4, block volume creation...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: cns-3.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: CNS 3.10
Assignee: John Mulligan
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks: 1568862
TreeView+ depends on / blocked
 
Reported: 2018-08-03 10:50 UTC by Neha Berry
Modified: 2019-02-11 10:20 UTC (History)
9 users (show)

Fixed In Version: rhgs-volmanager-rhel7:3.4.0-3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-12 09:23:49 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:2686 0 None None None 2018-09-12 09:25:05 UTC

Description Neha Berry 2018-08-03 10:50:33 UTC
This bug was hit as a corner case even though bug https://bugzilla.redhat.com/show_bug.cgi?id=1584123 is fixed in 7.0.0-2. The corner case is seen when we had old heketi build(6.0.0.7-4) and free size>actual size in vol info and then heketi is upgraded to 7.0.2 & beyond.

Bug Description
+++++++++++++++++++++

1. We had a CNS 3.9 LIVE setup with heketi version = heketi-6.0.0-7.4(also from live)
2. As per gluster-block-dev's ask( with block upgrade, we enter a state where the clients(run in tcmu-runner) run different versions.) , we were keeping some pods on CNS 3.9 rhgs version and some on CNS 3.10 rhgs version and check the  impact on IO and block device creations & deletions.

3. With heketi version being at 6.0.0-7.4, for one block hosting volume, the free size(was showing as 172) was greater than the actual size(100). Thus even though it was full, heketi was trying to still use the same volume for new block devices.
 
4. Upgraded heketi version to 7.0.5 as well. 

5. The free size for the vol was 172.


---------------------------------------

[root@dhcp42-137 ~]# heketi-cli volume info 33bed42d0e5eec32a53dd5b8ff35b509
Name: vol_33bed42d0e5eec32a53dd5b8ff35b509
Size: 100
Volume Id: 33bed42d0e5eec32a53dd5b8ff35b509
Cluster Id: 294d442c71f75cd44a2ca1a77c6716f4
Mount: 10.70.42.84:vol_33bed42d0e5eec32a53dd5b8ff35b509
Mount Options: backup-volfile-servers=10.70.41.217,10.70.42.223
Block: true
Free Size: 172

---------------------------------------

6. Tried creating new block devices with heketi version being 7.0.5.But each create is failing due to following error message:

heketi-cli blockvolume create --size=2 --ha=3 --name=hello1 --user=admin --secret=adminkey
command terminated with exit code 137

A panic was seen 
==================
[heketi] WARNING 2018/08/02 10:52:33 ModifyFreeSize: FreeSize[172], delta[-2]
panic: CHECK:
	func (github.com/heketi/heketi/apps/glusterfs.(*VolumeEntry).ModifyFreeSize) 0x14d6877
	File /builddir/build/BUILD/heketi-7.0.0/src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:324

=======================================

The free size> actual size issue was existent in 6.0.0.7-4 but is actually fixed in heketi-7.0.0-2 and beyond.

Could it be ,that,  since in heketi 7.0.0.5, the issue is already resolved, but was present in heketi-6.0.0-7.4, new heketi build is unable to handle this discrepancy. 


Your input will help. Kindly let us know the workaround that can be used to come out of this scenario.


+++++++  if current CNS 3.9 setups already have issue of BZ#1584123 and are upgraded to heketi 7.0.2 and beyond, new block device creations are failing due to this free size discrepancy. ++++++++++++++++


error message seen 
++++++++++++++++++++++++++++++++++++++

[negroni] Started GET /queue/e6da72116453e2e8e13e363a5586edd5
[negroni] Completed 200 OK in 168.995µs
[negroni] Started GET /queue/e6da72116453e2e8e13e363a5586edd5
[negroni] Completed 200 OK in 158.44µs
[kubeexec] DEBUG 2018/08/02 10:52:33 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp42-84.lab.eng.blr.redhat.com Pod: glusterfs-storage-lzmlr Command: gluster-block create vol_33bed42d0e5eec32a53dd5b8ff35b509/hello1  ha 3 auth disable prealloc full 10.70.42.223,10.70.41.217,10.70.42.84 2GiB --json
Result: { "IQN": "iqn.2016-12.org.gluster-block:a0acf1f8-f425-42cd-9c59-715f2d77f2cc", "PORTAL(S)": [ "10.70.42.223:3260", "10.70.41.217:3260", "10.70.42.84:3260" ], "RESULT": "SUCCESS" }
[heketi] WARNING 2018/08/02 10:52:33 ModifyFreeSize: FreeSize[172], delta[-2]
panic: CHECK:
	func (github.com/heketi/heketi/apps/glusterfs.(*VolumeEntry).ModifyFreeSize) 0x14d6877
	File /builddir/build/BUILD/heketi-7.0.0/src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:324

goroutine 74700 [running]:
github.com/lpabon/godbc.dbc_panic(0x189b924, 0x5, 0x0, 0x0, 0x0, 0x0)
	/builddir/build/BUILD/heketi-7.0.0/src/github.com/lpabon/godbc/godbc.go:85 +0x40c
github.com/lpabon/godbc.Check(0xc4201f0b00, 0x0, 0x0, 0x0)
	/builddir/build/BUILD/heketi-7.0.0/src/github.com/lpabon/godbc/godbc.go:123 +0x5d
github.com/heketi/heketi/apps/glusterfs.(*VolumeEntry).ModifyFreeSize(0xc4201f4000, 0xfffffffffffffffe)
	/builddir/build/BUILD/heketi-7.0.0/src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:324 +0x178
github.com/heketi/heketi/apps/glusterfs.(*BlockVolumeEntry).saveCreateBlockVolume.func1(0xc420184ee0, 0x1716de0, 0x24f2501)
	/builddir/build/BUILD/heketi-7.0.0/src/github.com/heketi/heketi/apps/glusterfs/block_volume_entry.go:275 +0x144
github.com/heketi/heketi/pkg/db.(*TxWrap).Update(0xc4202c5840, 0xc4202c5890, 0x1729e80, 0x1)
	/builddir/build/BUILD/heketi-7.0.0/src/github.com/heketi/heketi/pkg/db/wrap.go:87 +0x3f
github.com/heketi/heketi/apps/glusterfs.(*BlockVolumeEntry).saveCreateBlockVolume(0xc4201e0a90, 0x2458d60, 0xc4202c5840, 0x0, 0x24f25b0)
	/builddir/build/BUILD/heketi-7.0.0/src/github.com/heketi/heketi/apps/glusterfs/block_volume_entry.go:251 +0x72
github.com/heketi/heketi/apps/glusterfs.(*BlockVolumeCreateOperation).Finalize.func1(0xc420184ee0, 0x19148f8, 0xc420184ee0)
	/builddir/build/BUILD/heketi-7.0.0/src/github.com/heketi/heketi/apps/glusterfs/operations.go:846 +0x206
github.com/boltdb/bolt.(*DB).Update(0xc420342960, 0xc4202c5830, 0x0, 0x0)
	/builddir/build/BUILD/heketi-7.0.0/src/github.com/boltdb/bolt/db.go:595 +0x9a
github.com/heketi/heketi/apps/glusterfs.(*BlockVolumeCreateOperation).Finalize(0xc4201766a0, 0x246e840, 0xc420256230)
	/builddir/build/BUILD/heketi-7.0.0/src/github.com/heketi/heketi/apps/glusterfs/operations.go:823 +0x6f
github.com/heketi/heketi/apps/glusterfs.AsyncHttpOperation.func1(0x0, 0x0, 0x0, 0x0)
	/builddir/build/BUILD/heketi-7.0.0/src/github.com/heketi/heketi/apps/glusterfs/operations_manage.go:116 +0x60f
github.com/heketi/rest.(*AsyncHttpHandler).handle.func1(0xc4207249c0, 0xc420bb62a0)
	/builddir/build/BUILD/heketi-7.0.0/src/github.com/heketi/rest/asynchttp.go:291 +0xf6
created by github.com/heketi/rest.(*AsyncHttpHandler).handle
	/builddir/build/BUILD/heketi-7.0.0/src/github.com/heketi/rest/asynchttp.go:287 +0x49

-------------------------------------
+++++++++++++++++++++++++++++++++++++++++++++++

[root@dhcp42-137 ~]# heketi-cli volume info 33bed42d0e5eec32a53dd5b8ff35b509
Name: vol_33bed42d0e5eec32a53dd5b8ff35b509
Size: 100
Volume Id: 33bed42d0e5eec32a53dd5b8ff35b509
Cluster Id: 294d442c71f75cd44a2ca1a77c6716f4
Mount: 10.70.42.84:vol_33bed42d0e5eec32a53dd5b8ff35b509
Mount Options: backup-volfile-servers=10.70.41.217,10.70.42.223
Block: true
Free Size: 172
Block Volumes: [0ae9027b9f70780f0316b1b8448dfe7d 0f02c63dc9454a107d5c533cbf846c60 1559a190e895726b917039041d6dd6e9 18a6f7397c14d5600b6b379b04de9055 199b92ecde07beb039b2d12781631103 2c9c161faca355e57f6bc00b45f9c316 2f2dc8a9881b45374faa99e3b8194c66 363c37a33e2f43148fab89f360ac4b76 3f361422d5a4e3b7544dfde919798ef5 43e118d399dba856b33ea5db93d7a621 4a5e22950c09d8fdf7475aac6d06a574 4ae2e3acaffc65dc5d1e761901ba5881 505a7d71d2435bd1a574e0d7d4b8499b 55404027415b32b330376b861efee128 5a729b8e8a04817b34dcec4c4a3ae3a9 5fc3cb2a98f73dfb902b0ba144e4ea4a 60341462c4e96beca5c7aa597529e40f 62fc1df4c7edcadc583254e95d141a6e 69b3a4040de5d847fc4157c5149ce0c8 71862bf3ac18087d28e67dc1ab36db65 7b6c1f877e0ffd419e839f45aedbaaf3 82217a727c58d622ccdb0d654a0ee41f 844e171e3b1382fce1fcbf2d5450da51 89f6e9cf7ab1a7cab916bdc5ab89aa6e 8bec3690e3b951703f75b91361667e19 937df2d124e891abb3862cc83b27aabd a69a784c9104ffbcdec2bf9d8f3a141d a8f61806fc9bafa10c404db7b9ef8911 ab96e32c6902093190f1317c24aab0bf aee807e76f7063f7e38054168b80d86f b931170c9a5f7fe802e1d82eb45525e8 b9c61ce96f064de6aad3ccc7cda51337 bbd544539eb6ab50b174f04fc2090f40 bd35030c86a17b309a3e5059636c7363 c4243252bd429b4b0cd4f87a5d984b5a c5612338684a8406503b05b220c77e93 d516ce72aa3dcdcd10c2405aebebae05 d65e89b906a709cc90674b947427dbe4 e0431473be783e8d7f18827dd8b06080 e3f55620749b3c7bcec4dd0e670250f4 e66bc79263c12cb0a993710ff6641d05 ed44482204cc49aec2e37b0cc2bc67a0 f75a39d89e973cd567395fbe7e460e03 f9dfff25d32619b201ac461150bbb6b5]
Durability Type: replicate
Distributed+Replica: 3
[root@dhcp42-137 ~]# 

---------------------------------------------------------------


How reproducible
+++++++++++++++++++

The issue is seen for every new blockvolume create.

Steps to reproduce
+++++++++++++++++++
1. Create a CNS 3.9 setup with LIVE builds(heketi-6.0.0.7-4)
2. Create multiple blockvolumes. You may hit a case where the free size of the block hosting volume > the actual size of the hosting volume.
https://bugzilla.redhat.com/show_bug.cgi?id=1584123

3. With the above issue still in the setup, upgrade the glusterfs pods and heketi to CNS 3.10 . Heketi version should be more than 7.0.0.2.

4. Once system is upgraded, try creating a block volume od size say 5 GB. 

5. The creation will fail and there will be a PANIC message from heketi.

Comment 17 errata-xmlrpc 2018-09-12 09:23:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2686


Note You need to log in before you can comment on or make changes to this bug.