Description of problem: ====================== Hit this while verifying https://bugzilla.redhat.com/show_bug.cgi?id=1474256 - Block continues to exist in a non-healthy state after a failed delete. 'gluster-block delete' command is working as expected. If gluster-blockd service is brought down on one of the ha nodes (in the middle of deletion), block-delete command does go ahead and delete the block. No stale entries of the block remain. Meta information of the block does not exist. However, 'gluster-block create' command when executed with gluster-blockd service going down in the middle, rollback of internal changes does not happen cleanly. Pasted below is the supporting output. Gluster-block logs and sosreports copied to http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/ Version-Release number of selected component (if applicable): ============================================================ gluster-block-0.2.1-8 and glusterfs-3.8.4-42. How reproducible: ================ Multiple times Additional info: =============== The block 'bki' continues to exist in a partial-state, after a failed create. [root@dhcp47-121 block-meta]# [root@dhcp47-121 block-meta]# for i in {a,b,c,d,e}; do gluster-block create ozone/bk$i ha 2 10.70.47.121,10.70.47.113 20M > done IQN: iqn.2016-12.org.gluster-block:26c9bf4d-b70d-4b9d-a8eb-bbf63ae1dc5d PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 RESULT: SUCCESS IQN: iqn.2016-12.org.gluster-block:2b6cbb1c-b7c9-4fce-985d-c17052d6e068 PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 RESULT: SUCCESS IQN: iqn.2016-12.org.gluster-block:aa84393d-c82c-40b1-b5f2-d4c725ba0a1a PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 RESULT: SUCCESS IQN: iqn.2016-12.org.gluster-block:f34d5aa8-c4aa-4bf7-ad49-eb6094d89688 PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 RESULT: SUCCESS failed to configure on 10.70.47.113 : Connection refused RESULT:FAIL [root@dhcp47-121 block-meta]# gluster-block list ozone bk1 bka bkb bkc bkd [root@dhcp47-121 block-meta]# ls bk1 bka bkb bkc bkd meta.lock [root@dhcp47-121 block-meta]# [root@dhcp47-121 block-meta]# [root@dhcp47-121 block-meta]# for i in {f,g,h,i,j}; do gluster-block create ozone/bk$i ha 3 10.70.47.121,10.70.47.113,10.70.47.114 50M; done IQN: iqn.2016-12.org.gluster-block:56afaae1-05fd-4e16-ba9f-7938eb6387bc PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 10.70.47.114:3260 RESULT: SUCCESS IQN: iqn.2016-12.org.gluster-block:5d5e6eb1-5e73-4fb3-a14b-013e634478ef PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 10.70.47.114:3260 RESULT: SUCCESS IQN: iqn.2016-12.org.gluster-block:496cf83d-6f89-4808-975e-bd5a0033542b PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 10.70.47.114:3260 RESULT: SUCCESS IQN: iqn.2016-12.org.gluster-block:cdd2a854-9ea7-4236-aeba-6321fa7b4247 PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 ROLLBACK ON: 10.70.47.114 10.70.47.113 10.70.47.121 RESULT: FAIL failed to configure on 10.70.47.114 : Connection refused RESULT:FAIL [root@dhcp47-121 block-meta]# gluster-block list ozone bk1 bka bkb bkc bkd bkg bki bkf bkh [root@dhcp47-121 block-meta]# ls bk1 bka bkb bkc bkd bkf bkg bkh bki meta.lock [root@dhcp47-121 block-meta]# cat bki VOLUME: ozone GBID: cdd2a854-9ea7-4236-aeba-6321fa7b4247 SIZE: 52428800 HA: 3 ENTRYCREATE: INPROGRESS ENTRYCREATE: SUCCESS 10.70.47.113: CONFIGINPROGRESS 10.70.47.114: CONFIGINPROGRESS 10.70.47.121: CONFIGINPROGRESS 10.70.47.114: CONFIGFAIL 10.70.47.113: CONFIGSUCCESS 10.70.47.121: CONFIGSUCCESS 10.70.47.114: CLEANUPINPROGRESS 10.70.47.113: CLEANUPINPROGRESS 10.70.47.121: CLEANUPINPROGRESS 10.70.47.113: CLEANUPSUCCESS 10.70.47.121: CLEANUPSUCCESS [root@dhcp47-121 block-meta]# [root@dhcp47-121 block-meta]# gluster-block info ozone/bki NAME: bki VOLUME: ozone GBID: cdd2a854-9ea7-4236-aeba-6321fa7b4247 SIZE: 52428800 HA: 3 PASSWORD: BLOCK CONFIG NODE(S): 10.70.47.114 [root@dhcp47-121 block-meta]#
Have received a confirmation from CNS qe that this will not be an issue with heketi. Not proposing it as a blocker for rhgs 3.3.0.
Reverified this on the build glusterfs-3.8.4-54.12 and gluster-block-0.2.1-20 and tcmu-runner-1.2.0-20. Executed gluster-block creates multiple times in a loop and did not see any failed create. Also, killed one of the ha peer gluster-blockd service in the middle, and that was gracefully handled. Did not see any other traceback than what is already being tracked in BZ 1595176. Moving this bug to verified. Detailed logs will be attached.
Created attachment 1455448 [details] verification logs rhgs331-async
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2691