Bug 1474256
Summary: | [Gluster-block]: Block continues to exist in a non-healthy state after a failed delete | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Sweta Anandpara <sanandpa> |
Component: | gluster-block | Assignee: | Prasanna Kumar Kalever <prasanna.kalever> |
Status: | CLOSED ERRATA | QA Contact: | Sweta Anandpara <sanandpa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.3 | CC: | amukherj, rcyriac, rhs-bugs, storage-qa-internal |
Target Milestone: | --- | ||
Target Release: | RHGS 3.3.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | gluster-block-0.2.1-8.el7rhgs | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-09-21 04:20:54 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1417151 |
Description
Sweta Anandpara
2017-07-24 08:59:22 UTC
Sosreports at the location http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/ [qe@rhsqe-repo 1474256]$ [qe@rhsqe-repo 1474256]$ pwd /home/repo/sosreports/1474256 [qe@rhsqe-repo 1474256]$ [qe@rhsqe-repo 1474256]$ [qe@rhsqe-repo 1474256]$ hostname rhsqe-repo.lab.eng.blr.redhat.com [qe@rhsqe-repo 1474256]$ [qe@rhsqe-repo 1474256]$ ll total 12 drwxr-xr-x. 2 qe qe 4096 Jul 24 14:30 gluster-block_dhcp47-113 drwxr-xr-x. 2 qe qe 4096 Jul 24 14:31 gluster-block_dhcp47-114 drwxr-xr-x. 2 qe qe 4096 Jul 24 14:29 gluster-block_dhcp47-121 [qe@rhsqe-repo 1474256]$ > So, the design idea was that the user might run a cron job, ideally its duty is to look at the partially failed blocks and issue a delete every now and then.
Firstly, "issuing a delete every now and then" does not sound like good design decision, nor is it safe with probability of going wrong _every_now_and_then_.
Secondly, where is the book-keeping done of partially failed blocks? How do we get to know that a block is good, or bad?
Lastly, why rely on a cron job, or on the user side for doing the required clean-up? The moment we expose partially-baked-data to the user, there is more chance for a user to introduce harm to the system/environment.
My knowledge is a little limited in this area, but could you please give an example/reference of any other storage device that we ship, where we let the failed creates/deletes to be present in the system as is, and not really roll-back (or take corrective measures) on the internal changes that have proceeded partially?
Tested and verified this on the build gluster-block-0.2.1-8 and glusterfs-3.8.4-42.
'gluster-block delete' command is working as expected. If gluster-blockd service is brought down on one of the ha nodes (in the middle of deletion), block-delete command does go ahead and delete the block. No stale entries of the block remain. Meta information of the block does not exist.
However, 'gluster-block create' command when executed with gluster-blockd service going down in the middle, rollback of internal changes does not happen cleanly. Pasted below is the supporting output. The block 'bki' continues to exist in a partial-state, after a failed create.
[root@dhcp47-121 block-meta]#
[root@dhcp47-121 block-meta]# for i in {a,b,c,d,e}; do gluster-block create ozone/bk$i ha 2 10.70.47.121,10.70.47.113 20M
> done
IQN: iqn.2016-12.org.gluster-block:26c9bf4d-b70d-4b9d-a8eb-bbf63ae1dc5d
PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260
RESULT: SUCCESS
IQN: iqn.2016-12.org.gluster-block:2b6cbb1c-b7c9-4fce-985d-c17052d6e068
PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260
RESULT: SUCCESS
IQN: iqn.2016-12.org.gluster-block:aa84393d-c82c-40b1-b5f2-d4c725ba0a1a
PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260
RESULT: SUCCESS
IQN: iqn.2016-12.org.gluster-block:f34d5aa8-c4aa-4bf7-ad49-eb6094d89688
PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260
RESULT: SUCCESS
failed to configure on 10.70.47.113 : Connection refused
RESULT:FAIL
[root@dhcp47-121 block-meta]# gluster-block list ozone
bk1
bka
bkb
bkc
bkd
[root@dhcp47-121 block-meta]# ls
bk1 bka bkb bkc bkd meta.lock
[root@dhcp47-121 block-meta]#
[root@dhcp47-121 block-meta]#
[root@dhcp47-121 block-meta]# for i in {f,g,h,i,j}; do gluster-block create ozone/bk$i ha 3 10.70.47.121,10.70.47.113,10.70.47.114 50M; done
IQN: iqn.2016-12.org.gluster-block:56afaae1-05fd-4e16-ba9f-7938eb6387bc
PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 10.70.47.114:3260
RESULT: SUCCESS
IQN: iqn.2016-12.org.gluster-block:5d5e6eb1-5e73-4fb3-a14b-013e634478ef
PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 10.70.47.114:3260
RESULT: SUCCESS
IQN: iqn.2016-12.org.gluster-block:496cf83d-6f89-4808-975e-bd5a0033542b
PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260 10.70.47.114:3260
RESULT: SUCCESS
IQN: iqn.2016-12.org.gluster-block:cdd2a854-9ea7-4236-aeba-6321fa7b4247
PORTAL(S): 10.70.47.121:3260 10.70.47.113:3260
ROLLBACK ON: 10.70.47.114 10.70.47.113 10.70.47.121
RESULT: FAIL
failed to configure on 10.70.47.114 : Connection refused
RESULT:FAIL
[root@dhcp47-121 block-meta]# gluster-block list ozone
bk1
bka
bkb
bkc
bkd
bkg
bki
bkf
bkh
[root@dhcp47-121 block-meta]# ls
bk1 bka bkb bkc bkd bkf bkg bkh bki meta.lock
[root@dhcp47-121 block-meta]# cat bki
VOLUME: ozone
GBID: cdd2a854-9ea7-4236-aeba-6321fa7b4247
SIZE: 52428800
HA: 3
ENTRYCREATE: INPROGRESS
ENTRYCREATE: SUCCESS
10.70.47.113: CONFIGINPROGRESS
10.70.47.114: CONFIGINPROGRESS
10.70.47.121: CONFIGINPROGRESS
10.70.47.114: CONFIGFAIL
10.70.47.113: CONFIGSUCCESS
10.70.47.121: CONFIGSUCCESS
10.70.47.114: CLEANUPINPROGRESS
10.70.47.113: CLEANUPINPROGRESS
10.70.47.121: CLEANUPINPROGRESS
10.70.47.113: CLEANUPSUCCESS
10.70.47.121: CLEANUPSUCCESS
[root@dhcp47-121 block-meta]#
[root@dhcp47-121 block-meta]# gluster-block info ozone/bki
NAME: bki
VOLUME: ozone
GBID: cdd2a854-9ea7-4236-aeba-6321fa7b4247
SIZE: 52428800
HA: 3
PASSWORD:
BLOCK CONFIG NODE(S): 10.70.47.114
[root@dhcp47-121 block-meta]#
Gluster-block logs and sosreports copied to http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/new/ Related change: https://review.gluster.org/#/c/18131/ Have raised https://bugzilla.redhat.com/show_bug.cgi?id=1490818 for the concern mentioned in comment 10, as instructed in comment 14. Moving this bug to verified in rhgs 3.3.0. Supporting logs present in comment10. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2773 |