Description of problem: ======================== Hit this while verifying https://bugzilla.redhat.com/show_bug.cgi?id=1449245. While deleting a block if gluster-block daemon of one of the HA nodes goes down, the CLI reports the delete to have failed. However, the block would have undergone deletion on the other HA nodes (which were up) and hence, the block even though still exists, is not fully functional. It leads to unpredictable behaviour of the block, like: * initiator discovery succeeds. Login succeeds. But lsblk does not show up the block device. * Meta information shows the CLEANUP to have taken place on one node, and not the other node. * New create of a block with the same name fails with the error - 'block already exists' Version-Release number of selected component (if applicable): ============================================================== glusterfs-3.8.4-33 and gluster-block-02.1-6 How reproducible: ================= 2:2 Steps to Reproduce: =================== 1. Create a block with ha=3 on node1,node2 and node3 2. Execute block delete on node1, and kill gluster-block daemon on node2 3. Bring back gluster-block daemon on node2 Actual results: =============== Step2 reports the delete to have failed. Meta information of the block shows that the delete has succeeded on node1 and node3, and failed on node2 'gluster-block info <volname>/<blockname>' shows the block to be present, with the parameter 'BLOCK CONFIG NODES' showing only node2. Expected results: =============== Block should not be left in a non-healthy/partial state, post the failure of a create or delete. The changes that have taken place in the system should either be rolled back, or an explicit guidance has to be given to take corrective measures on such sub-optimal blocks. Additional info: ================= [root@dhcp47-121 ~]# gluster v info testvol Volume Name: testvol Type: Replicate Volume ID: 35a0b1a7-0dc3-4536-96aa-bd181b91c381 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.47.121:/bricks/brick2/testvol0 Brick2: 10.70.47.113:/bricks/brick2/testvol1 Brick3: 10.70.47.114:/bricks/brick2/testvol2 Options Reconfigured: nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off performance.open-behind: off performance.readdir-ahead: off network.remote-dio: enable cluster.eager-lock: enable cluster.quorum-type: auto cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off server.allow-insecure: on cluster.brick-multiplex: disable cluster.enable-shared-storage: enable [root@dhcp47-121 ~]# gluster-block info testvol/* block name(*) should contain only aplhanumeric,'-' and '_' characters [root@dhcp47-121 ~]# gluster-block info testvol/testblock NAME: testblock VOLUME: testvol GBID: 80701a23-8483-46fc-a7aa-ef4588e036ba SIZE: 1048576 HA: 2 PASSWORD: BLOCK CONFIG NODE(S): 10.70.47.113 10.70.47.121 [root@dhcp47-121 ~]# gluster-block info testvol/bk1 NAME: bk1 VOLUME: testvol GBID: 4949789d-dc5c-47d6-a18e-7fc09c988d62 SIZE: 1048576 HA: 3 PASSWORD: BLOCK CONFIG NODE(S): 10.70.47.114 10.70.47.113 10.70.47.121 [root@dhcp47-121 ~]# gluster-block info testvol/bk2 NAME: bk2 VOLUME: testvol GBID: 0288af11-603a-42f7-896f-d2fc498b900f SIZE: 1048576 HA: 1 PASSWORD: 8ed48837-f22b-43f1-911c-6e635ca7a711 BLOCK CONFIG NODE(S): 10.70.47.114 [root@dhcp47-121 ~]# gluster-block info testvol/bk4 NAME: bk4 VOLUME: testvol GBID: 626a71ca-2372-420b-b0b4-b449fa2d6f88 SIZE: 1048576 HA: 2 PASSWORD: c18af2c1-fc49-4ec5-bdd4-a1b3b358e905 BLOCK CONFIG NODE(S): 10.70.47.114 10.70.47.113 [root@dhcp47-121 ~]# [root@dhcp47-121 ~]# [root@dhcp47-121 ~]# [root@dhcp47-121 ~]# [root@dhcp47-121 ~]# gluster-block delete Inadequate arguments for delete: gluster-block delete <volname/blockname> [--json*] [root@dhcp47-121 ~]# gluster-block delete testvol/bk1 FAILED ON: 10.70.47.113 SUCCESSFUL ON: 10.70.47.114 10.70.47.121 RESULT: FAIL [root@dhcp47-121 ~]# gluster-block info testvol/bk1 NAME: bk1 VOLUME: testvol GBID: 4949789d-dc5c-47d6-a18e-7fc09c988d62 SIZE: 1048576 HA: 3 PASSWORD: BLOCK CONFIG NODE(S): 10.70.47.113 [root@dhcp47-121 ~]# mount | grep testvol [root@dhcp47-121 ~]# gluster-block list testvol testblock bk1 bk2 bk4 [root@dhcp47-121 ~]# mkdir /mnt/testvol [root@dhcp47-121 ~]# mount -t glusterfs 10.70.47.121:testvol /mnt/testvol [root@dhcp47-121 ~]# cd /mnt/testvol [root@dhcp47-121 testvol]# cd block-meta/ [root@dhcp47-121 block-meta]# cat bk1 VOLUME: testvol GBID: 4949789d-dc5c-47d6-a18e-7fc09c988d62 SIZE: 1048576 HA: 3 ENTRYCREATE: INPROGRESS ENTRYCREATE: SUCCESS 10.70.47.114: CONFIGINPROGRESS 10.70.47.113: CONFIGINPROGRESS 10.70.47.121: CONFIGINPROGRESS 10.70.47.113: CONFIGSUCCESS 10.70.47.114: CONFIGSUCCESS 10.70.47.121: CONFIGSUCCESS 10.70.47.114: CLEANUPINPROGRESS 10.70.47.121: CLEANUPINPROGRESS 10.70.47.113: CLEANUPINPROGRESS 10.70.47.114: CLEANUPSUCCESS 10.70.47.121: CLEANUPSUCCESS [root@dhcp47-121 block-meta]#
Sosreports at the location http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/ [qe@rhsqe-repo 1474256]$ [qe@rhsqe-repo 1474256]$ pwd /home/repo/sosreports/1474256 [qe@rhsqe-repo 1474256]$ [qe@rhsqe-repo 1474256]$ [qe@rhsqe-repo 1474256]$ hostname rhsqe-repo.lab.eng.blr.redhat.com [qe@rhsqe-repo 1474256]$ [qe@rhsqe-repo 1474256]$ ll total 12 drwxr-xr-x. 2 qe qe 4096 Jul 24 14:30 gluster-block_dhcp47-113 drwxr-xr-x. 2 qe qe 4096 Jul 24 14:31 gluster-block_dhcp47-114 drwxr-xr-x. 2 qe qe 4096 Jul 24 14:29 gluster-block_dhcp47-121 [qe@rhsqe-repo 1474256]$
> So, the design idea was that the user might run a cron job, ideally its duty is to look at the partially failed blocks and issue a delete every now and then. Firstly, "issuing a delete every now and then" does not sound like good design decision, nor is it safe with probability of going wrong _every_now_and_then_. Secondly, where is the book-keeping done of partially failed blocks? How do we get to know that a block is good, or bad? Lastly, why rely on a cron job, or on the user side for doing the required clean-up? The moment we expose partially-baked-data to the user, there is more chance for a user to introduce harm to the system/environment. My knowledge is a little limited in this area, but could you please give an example/reference of any other storage device that we ship, where we let the failed creates/deletes to be present in the system as is, and not really roll-back (or take corrective measures) on the internal changes that have proceeded partially?
Tested and verified this on the build gluster-block-0.2.1-8 and glusterfs-3.8.4-42. 'gluster-block delete' command is working as expected. If gluster-blockd service is brought down on one of the ha nodes (in the middle of deletion), block-delete command does go ahead and delete the block. No stale entries of the block remain. Meta information of the block does not exist. However, 'gluster-block create' command when executed with gluster-blockd service going down in the middle, rollback of internal changes does not happen cleanly. Pasted below is the supporting output. The block 'bki' continues to exist in a partial-state, after a failed create. [root@dhcp47-121 block-meta]# [root@dhcp47-121 block-meta]# for i in {a,b,c,d,e}; do gluster-block create ozone/bk$i ha 2 10.70.47.121,10.70.47.113 20M > done IQN: iqn.2016-12.org.gluster-block:26c9bf4d-b70d-4b9d-a8eb-bbf63ae1dc5d PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 RESULT: SUCCESS IQN: iqn.2016-12.org.gluster-block:2b6cbb1c-b7c9-4fce-985d-c17052d6e068 PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 RESULT: SUCCESS IQN: iqn.2016-12.org.gluster-block:aa84393d-c82c-40b1-b5f2-d4c725ba0a1a PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 RESULT: SUCCESS IQN: iqn.2016-12.org.gluster-block:f34d5aa8-c4aa-4bf7-ad49-eb6094d89688 PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 RESULT: SUCCESS failed to configure on 10.70.47.113 : Connection refused RESULT:FAIL [root@dhcp47-121 block-meta]# gluster-block list ozone bk1 bka bkb bkc bkd [root@dhcp47-121 block-meta]# ls bk1 bka bkb bkc bkd meta.lock [root@dhcp47-121 block-meta]# [root@dhcp47-121 block-meta]# [root@dhcp47-121 block-meta]# for i in {f,g,h,i,j}; do gluster-block create ozone/bk$i ha 3 10.70.47.121,10.70.47.113,10.70.47.114 50M; done IQN: iqn.2016-12.org.gluster-block:56afaae1-05fd-4e16-ba9f-7938eb6387bc PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 10.70.47.114:3260 RESULT: SUCCESS IQN: iqn.2016-12.org.gluster-block:5d5e6eb1-5e73-4fb3-a14b-013e634478ef PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 10.70.47.114:3260 RESULT: SUCCESS IQN: iqn.2016-12.org.gluster-block:496cf83d-6f89-4808-975e-bd5a0033542b PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 10.70.47.114:3260 RESULT: SUCCESS IQN: iqn.2016-12.org.gluster-block:cdd2a854-9ea7-4236-aeba-6321fa7b4247 PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 ROLLBACK ON: 10.70.47.114 10.70.47.113 10.70.47.121 RESULT: FAIL failed to configure on 10.70.47.114 : Connection refused RESULT:FAIL [root@dhcp47-121 block-meta]# gluster-block list ozone bk1 bka bkb bkc bkd bkg bki bkf bkh [root@dhcp47-121 block-meta]# ls bk1 bka bkb bkc bkd bkf bkg bkh bki meta.lock [root@dhcp47-121 block-meta]# cat bki VOLUME: ozone GBID: cdd2a854-9ea7-4236-aeba-6321fa7b4247 SIZE: 52428800 HA: 3 ENTRYCREATE: INPROGRESS ENTRYCREATE: SUCCESS 10.70.47.113: CONFIGINPROGRESS 10.70.47.114: CONFIGINPROGRESS 10.70.47.121: CONFIGINPROGRESS 10.70.47.114: CONFIGFAIL 10.70.47.113: CONFIGSUCCESS 10.70.47.121: CONFIGSUCCESS 10.70.47.114: CLEANUPINPROGRESS 10.70.47.113: CLEANUPINPROGRESS 10.70.47.121: CLEANUPINPROGRESS 10.70.47.113: CLEANUPSUCCESS 10.70.47.121: CLEANUPSUCCESS [root@dhcp47-121 block-meta]# [root@dhcp47-121 block-meta]# gluster-block info ozone/bki NAME: bki VOLUME: ozone GBID: cdd2a854-9ea7-4236-aeba-6321fa7b4247 SIZE: 52428800 HA: 3 PASSWORD: BLOCK CONFIG NODE(S): 10.70.47.114 [root@dhcp47-121 block-meta]#
Gluster-block logs and sosreports copied to http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/new/
Related change: https://review.gluster.org/#/c/18131/
Have raised https://bugzilla.redhat.com/show_bug.cgi?id=1490818 for the concern mentioned in comment 10, as instructed in comment 14. Moving this bug to verified in rhgs 3.3.0. Supporting logs present in comment10.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2773