Bug 1293349
| Summary: | AFR Can ignore the zero size files while checking for spli-brain | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | RajeshReddy <rmekala> |
| Component: | replicate | Assignee: | Ravishankar N <ravishankar> |
| Status: | CLOSED ERRATA | QA Contact: | Vijay Avuthu <vavuthu> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | high | ||
| Version: | rhgs-3.1 | CC: | aspandey, nbalacha, nchilaka, olim, pkarampu, ravishankar, rhinduja, rhs-bugs, sankarshan, sheggodu |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | RHGS 3.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | rebase | ||
| Fixed In Version: | glusterfs-3.12.2-1 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-09-04 06:27:31 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1503134 | ||
|
Description
RajeshReddy
2015-12-21 14:03:05 UTC
sosreport is available @ /home/repo/sosreports/bug.1293349 on rhsqe-repo.lab.eng.blr.redhat.com Yes file on hot tier in split-brain state, On one node cold tier contains the actual file and another node contains link file [root@rhs-client18 ~]# gluster vol heal afr1x2_tier info split-brain Brick rhs-client18.lab.eng.blr.redhat.com:/rhs/brick6/afr1x2_tier_hot <gfid:d9e9a230-0e46-4abe-8101-13910ed25f87> Number of entries in split-brain: 1 Brick rhs-client19.lab.eng.blr.redhat.com:/rhs/brick6/afr1x2_tier_hot <gfid:d9e9a230-0e46-4abe-8101-13910ed25f87> Number of entries in split-brain: 1 Brick rhs-client19.lab.eng.blr.redhat.com:/rhs/brick7/afr1x2_tier_cold Number of entries in split-brain: 0 Brick rhs-client18.lab.eng.blr.redhat.com:/rhs/brick7/afr1x2_tier_cold Number of entries in split-brain: 0 gluster vol heal afr1x2_tier info split-brain shows file in split-brain and more over this file is not getting promoted In replica volume user expects both nodes should contain the same data in this case two nodes not having same data Rajesh,
I think we are on the same page. File that is shown in split-brain is a link file not the data file. And yes the file won't get promoted until the split-brain is resolved on hot-tier. I didn't understand "In replica volume user expects both nodes should contain the same data in this case two nodes not having same data" Are you saying the two bricks which are in replication don't have same data?
Pranith
Earlier i was seeing differences (one brick contains actual file and another one contains link file) between two bricks which are in replication but now both bricks having same data Once file is split-brain promotions will fail and is expected and here zero size files are in split brain so afr can ignore these files while checking for split-brain [root@rhs-client18 tier]# gluster vol heal afr1x2_tier info split-brain Brick rhs-client18.lab.eng.blr.redhat.com:/rhs/brick6/afr1x2_tier_hot <gfid:d9e9a230-0e46-4abe-8101-13910ed25f87> Number of entries in split-brain: 1 Brick rhs-client19.lab.eng.blr.redhat.com:/rhs/brick6/afr1x2_tier_hot <gfid:d9e9a230-0e46-4abe-8101-13910ed25f87> Number of entries in split-brain: 1 Brick rhs-client19.lab.eng.blr.redhat.com:/rhs/brick7/afr1x2_tier_cold Number of entries in split-brain: 0 Brick rhs-client18.lab.eng.blr.redhat.com:/rhs/brick7/afr1x2_tier_cold Number of entries in split-brain: 0 [root@rhs-client18 tier]# cd /rhs/brick6/afr1x2_tier_hot [root@rhs-client18 afr1x2_tier_hot]# ls big new split test [root@rhs-client18 afr1x2_tier_hot]# cd new/ [root@rhs-client18 new]# ls -lrth total 4.0K ---------T. 2 root root 0 Dec 21 18:18 one.txt [root@rhs-client18 new]# getfattr -d -m . -e hex one.txt # file: one.txt security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.afr1x2_tier-client-2=0x000000010000000000000000 trusted.afr.dirty=0x000000030000000000000000 trusted.gfid=0xd9e9a2300e464abe810113910ed25f87 trusted.tier.tier-dht.linkto=0x6166723178325f746965722d636f6c642d64687400 [root@rhs-client19 ~]# cd /rhs/brick6/afr1x2_tier_hot/new [root@rhs-client19 new]# getfattr -d -m . -e hex one.txt # file: one.txt security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.afr1x2_tier-client-3=0x000000010000000000000000 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x03000000000000005677f3ad0007f386 trusted.gfid=0xd9e9a2300e464abe810113910ed25f87 trusted.tier.tier-dht.linkto=0x6166723178325f746965722d636f6c642d64687400 We can take that as enhancement, where if the files are in data split-brain and both the files have zero size, it will remove split-brain automatically. Could you change the bug description to reflect the same? Pranith Upstream patch https://review.gluster.org/#/c/18283 (In reply to Ravishankar N from comment #14) > Upstream patch https://review.gluster.org/#/c/18283 There is also a follow-up patch: https://review.gluster.org/#/c/18391/ (so 2 patches in total for this bug). Note that the fixes were sent as part of fixing BZ 1482812 Update: ========= Build Used: glusterfs-3.12.2-7.el7rhgs.x86_64 Scenario 1 : 1) create 1 * 2 volume and start 2) disable self-heal-daemon 3) write file ( file1 ) with some content 4) kill b0 5) truncate the file to 0 6) bring b0 up 7) kill b1 8) truncate file to 0 9) bring b1 up 10) enable self-heal-daemon 11) check heal info 12) read file from client > After enabling self-heal daemon, below is the heal info # gluster vol heal 12 info Brick 10.70.35.61:/bricks/brick1/b0 /file1 /file_sb - Is in split-brain Status: Connected Number of entries: 2 Brick 10.70.35.174:/bricks/brick1/b1 <gfid:9362fdc1-26fa-47b7-b4e6-e066679fbf35> <gfid:2d343643-19c0-4dda-b37f-1e995a3d1c9d> - Is in split-brain Status: Connected Number of entries: 2 # From node 1 : # getfattr -d -m . -e hex /bricks/brick1/b0/file1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/b0/file1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.12-client-1=0x000000010000000000000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x9362fdc126fa47b7b4e6e066679fbf35 trusted.gfid2path.d16e15bafe6e4256=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f66696c6531 From Node 2: # getfattr -d -m . -e hex /bricks/brick1/b1/file1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/b1/file1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.12-client-0=0x000000010000000000000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x9362fdc126fa47b7b4e6e066679fbf35 trusted.gfid2path.d16e15bafe6e4256=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f66696c6531 # If we observe above for file1, even though attr are blaming each other bricks, we are not seen that file as split-brain in heal info which is expected. and the file1 was healed after few minutes. Scenario 2: validated with meta-data split-brain having same meta-data on all the bricks. ( file name : file_meta1 ) > After enabling self-heal daemon, below is the heal info # getfattr -d -m . -e hex /bricks/brick1/b0/file_meta1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/b0/file_meta1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.12-client-1=0x000000000000000100000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0xd9481da807e34544bb389bf1763b4d91 trusted.gfid2path.76538e835da1a595=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f66696c655f6d65746131 # # getfattr -d -m . -e hex /bricks/brick1/b1/file_meta1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/b1/file_meta1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.12-client-0=0x000000000000000100000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0xd9481da807e34544bb389bf1763b4d91 trusted.gfid2path.76538e835da1a595=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f66696c655f6d65746131 # # date;gluster vol heal 12 info Fri Apr 20 04:56:29 EDT 2018 Brick 10.70.35.61:/bricks/brick1/b0 /file_meta1 /file_meta2 - Is in split-brain Status: Connected Number of entries: 2 Brick 10.70.35.174:/bricks/brick1/b1 <gfid:d9481da8-07e3-4544-bb38-9bf1763b4d91> <gfid:86433c76-d1ed-4be2-a7dd-a80d0ab3e80e> - Is in split-brain Status: Connected Number of entries: 2 # > healed file_meta1 after few min [root@dhcp35-163 ~]# date;gluster vol heal 12 info Fri Apr 20 05:48:20 EDT 2018 Brick 10.70.35.61:/bricks/brick1/b0 /file_meta2 - Is in split-brain Status: Connected Number of entries: 1 Brick 10.70.35.174:/bricks/brick1/b1 /file_meta2 - Is in split-brain Status: Connected Number of entries: 1 [root@dhcp35-163 ~]# Changing status to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607 |