Bug 1403840
Summary: | [GSS]xattr 'replica.split-brain-status' shows the file is in data-splitbrain but "heal split-brain latest-mtime" fails | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Riyas Abdulrasak <rnalakka> |
Component: | replicate | Assignee: | Ravishankar N <ravishankar> |
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | rhgs-3.1 | CC: | amukherj, asriram, nchilaka, pkarampu, ravishankar, rcyriac, rhinduja, rhs-bugs, rnalakka, ssampat, storage-qa-internal |
Target Milestone: | --- | ||
Target Release: | RHGS 3.2.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.8.4-9 | Doc Type: | Bug Fix |
Doc Text: |
Previously, split-brain resolution commands issued from the command line interface did not work when client-side self-heals were disabled, and returned incorrect output that indicated the file was not in split brain. This update ensures that split-brain resolution commands work regardless of whether client-side heal or the self-heal daemon are enabled.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-23 05:56:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1216951, 1223636, 1234054, 1351528, 1351530, 1405126, 1405130 |
Description
Riyas Abdulrasak
2016-12-12 13:12:27 UTC
bz 1233608 already captures this behavior. Ravi already posted a patch for this long back but we were not sure how glfsheal should behave when self-heal daemon is disabled as per the bz. Could you let us know what you expect from this CLI? If you feel split-brain resolution should resolve the file whatever are the options set on the volume then we can take Ravi's patch in. Do let us know your thoughts on this. Had a discussion with Riyas about this issue. So the behavior user of these CLIs are expecting is that the split-brain resolution succeeds whether or not self-heal-daemon/entry/data/metadata heal are enabled. So we can take in Ravi's patch after rebase to fix the same. http://review.gluster.org/11333 Giving ack based on comment #3 Downstream patch @ https://code.engineering.redhat.com/gerrit/#/c/93108 Note: As noted in comment#2, this patch is also for BZ 1233608. I'm not sure if we should close that as a duplicate of this one, considering this is the one which has all acks. *** Bug 1233608 has been marked as a duplicate of this bug. *** qatp: ==== first reproduced the problem on 3.8.4-3(before fix), was able to reproduced on build with fix 3.8.4-11: ====== the heal was successful when I choose to heal a split brain file using latest mtime gluster v heal rep2 split-brain source-brick latest-mtime <filename> ==>passed also as part of regression testing: I made sure that even healing splitbrain healing splitbrain with src-brick passed==>passed Brick 10.70.35.239:/rhs/brick1/rep2 /bigfile - Is in split-brain /srcbrk - Is in split-brain Status: Connected Number of entries: 2 [root@dhcp35-37 rep2]# gluster v heal rep2 split-brain source-brick 10.70.35.239:/rhs/brick1/rep2 /srcbrk Healed /srcbrk. ==>with mtime [root@dhcp35-37 rep2]# gluster v heal rep2 split-brain latest-mtime /f1 Healed /f1. Note: However when choosing bigger-file option to heal splitbrain, the healing is happening as expected in the background but the CLI displays heal as failed as below [root@dhcp35-37 rep2]# gluster v heal rep2 split-brain bigger-file /testbig Healing /testbig failed: File not in split-brain. Volume heal failed. root@dhcp35-37 rep2]# gluster v heal rep2 info Brick 10.70.35.116:/rhs/brick1/rep2 /testbig - Is in split-brain [root@dhcp35-37 rep2]# gluster v heal rep2 split-brain bigger-file /testbig Healing /testbig failed: File not in split-brain. Volume heal failed. ===>because of this the file is seen as heal pending . I have tested for two files and both end up with pending heal [root@dhcp35-37 rep2]# gluster v heal rep2 info Brick 10.70.35.116:/rhs/brick1/rep2 /bigfile /testbig Status: Connected Number of entries: 2 Brick 10.70.35.239:/rhs/brick1/rep2 Status: Connected Number of entries: 0 [root@dhcp35-37 rep2]# /bigfile Status: Connected Number of entries: 2 Brick 10.70.35.239:/rhs/brick1/rep2 /testbig - Is in split-brain Status: Connected Number of entries: 1 backend bricks==>heal is successful [root@dhcp35-116 ~]# md5sum /rhs/brick1/rep2/testbig 031bf15433a0c324c3c36b03b4ea384c /rhs/brick1/rep2/testbig [root@dhcp35-239 ~]# md5sum /rhs/brick1/rep2/testbig 031bf15433a0c324c3c36b03b4ea384c /rhs/brick1/rep2/testbig Volume Name: rep2 Type: Replicate Volume ID: 778d60b1-981b-4a33-9ed7-a7c09a389fa4 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.70.35.116:/rhs/brick1/rep2 Brick2: 10.70.35.239:/rhs/brick1/rep2 Options Reconfigured: cluster.self-heal-daemon: disable cluster.entry-self-heal: off cluster.data-self-heal: off cluster.metadata-self-heal: off transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable [root@dhcp35-37 rep2]# gluster v status rep2 Status of volume: rep2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.116:/rhs/brick1/rep2 49160 0 Y 18035 Brick 10.70.35.239:/rhs/brick1/rep2 49160 0 Y 15909 Task Status of Volume rep2 ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp35-37 rep2]# Note: when testing for split-brain bigger-file option: before this fix: the heal failed and also the cli errored out as below [root@dhcp35-192 rep2]# gluster v heal rep2 split-brain bigger-file /bigfile Healing /bigfile failed: File not in split-brain. Volume heal failed. After fix, for split-brain bigger-file option: the healing has passed, i checked the backend brick, where the heal is passed But CLI still says as below [root@dhcp35-37 rep2]# gluster v heal rep2 split-brain bigger-file /testbig Healing /testbig failed: File not in split-brain. Volume heal failed. I just tried it on latest downstream and it worked for me. 0:root@tuxpad ~$ gluster v heal testvol split-brain bigger-file /file Healed /file. 1. Was there a size difference in the file among the replica bricks? 2. Do you have the setup where it failed? Nag provided his set up where I was able to hit the issue. When we 'dd' into an existing file with bricks up/down alternatively to create data split-brain, the dirty xattr was also being set: [root@dhcp35-116 rep2]# getfattr -d -m . -e hex FILE # file: FILE security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000100000000 trusted.afr.rep2-client-1=0x000000020000000000000000 trusted.bit-rot.version=0x0200000000000000587c910100040687 trusted.gfid=0xe8051a5470d24be4a2b1ed446783bb02 [root@dhcp35-239 rep2]# g FILE # file: FILE security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000100000000 trusted.afr.rep2-client-0=0x000000020000000000000000 trusted.bit-rot.version=0x0200000000000000587c909f000c3e53 trusted.gfid=0xe8051a5470d24be4a2b1ed446783bb02 Now when the split-brain CLI is invoked, it heals the data-split-brain. Because the metadata-dirty is also set, it tries to heal that but since it is not in split-brain, afr adds the "File not in split-brain" message in the dictionary and it gets printed. We could probably raise a separate bug for it but IMO, it is not a blocker because if the self-heal daemon/client side heals were enabled (like they would be in a typical use case), there is a likely hood that the metadata heals would happen anyway and only the data split-brain would remain, and then when the CLI is subsequently run, you would get the "Healed <file>" message. I have raised a new bZ# https://bugzilla.redhat.com/show_bug.cgi?id=1413525 - resolving split-brain using "bigger-file" option fails to track the issue Hence moving this to verified(see comment#10) test build:3.8.4-11 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |