Bug 1293349 - AFR Can ignore the zero size files while checking for spli-brain
AFR Can ignore the zero size files while checking for spli-brain
Status: ON_QA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate (Show other bugs)
3.1
Unspecified Unspecified
high Severity unspecified
: ---
: RHGS 3.4.0
Assigned To: Ravishankar N
Vijay Avuthu
rebase
: ZStream
Depends On:
Blocks: 1503134
  Show dependency treegraph
 
Reported: 2015-12-21 09:03 EST by RajeshReddy
Modified: 2018-01-29 15:12 EST (History)
11 users (show)

See Also:
Fixed In Version: glusterfs-3.12.2-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description RajeshReddy 2015-12-21 09:03:05 EST
Description of problem:
===============
Able to perform IO to the file which is in split-brain


Version-Release number of selected component (if applicable):
==========
glusterfs-server-3.7.5-12.el7rhgs.x86_64

How reproducible:


Steps to Reproduce:
=============
1. Create 1x2 volume and attach 1x2 hot tier to the volume and mount it on client using nfs 
2. From the mount create directory and create file 
3. Bring down bricks (both hot and cold) from node1  and then append content to the file from the mount 
4. Bring back down bricks using gluster vol start force command and then immediately bring down the bricks (both hot and cold) from other node (node2)

At this time file got migrated from hot to cold 

5. Bring back down bricks using gluster vol start force command and after this able to see actual file in both hot and cold tier in the node1 and in node2 link file on cold and actual file on hot tier 

6. Run the gluster vol heal <vol> command and check the status of the heal using gluster vol heal <vol> info and shows file in split-brain but able to perform IO the file 



Expected results:
=========
Should not allow IO to file when it is in split-brain


Additional info:
===========
[root@tettnang ~]# gluster vol info afr1x2_tier 
 
Volume Name: afr1x2_tier
Type: Tier
Volume ID: 5d6db910-948c-484e-9672-0011ba3b7a09
Status: Started
Number of Bricks: 4
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick6/afr1x2_tier_hot
Brick2: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick6/afr1x2_tier_hot
Cold Tier:
Cold Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick3: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick7/afr1x2_tier_cold
Brick4: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick7/afr1x2_tier_cold
Options Reconfigured:
cluster.watermark-hi: 12
cluster.watermark-low: 10
performance.readdir-ahead: on
features.ctr-enabled: on
cluster.tier-mode: cache
cluster.self-heal-daemon: on


Getfattr information from 18
================
[root@rhs-client18 split]# getfattr -d -m . -e hex  /rhs/brick7/afr1x2_tier_cold/new/one.txt 
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick7/afr1x2_tier_cold/new/one.txt
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.afr1x2_tier-client-0=0x000000000000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005677f468000a1741
trusted.gfid=0xd9e9a2300e464abe810113910ed25f87

[root@rhs-client18 split]# getfattr -d -m . -e hex /rhs/brick6/afr1x2_tier_hot/new/one.txt 
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick6/afr1x2_tier_hot/new/one.txt
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.afr1x2_tier-client-2=0x000000010000000000000000
trusted.afr.dirty=0x000000030000000000000000
trusted.gfid=0xd9e9a2300e464abe810113910ed25f87
trusted.tier.tier-dht.linkto=0x6166723178325f746965722d636f6c642d64687400


Getfattr information form 19
==================

[root@rhs-client19 ~]# getfattr -d -m . -e hex  /rhs/brick7/afr1x2_tier_cold/new/one.txt
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick7/afr1x2_tier_cold/new/one.txt
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.bit-rot.version=0x02000000000000005677f4f6000325d0
trusted.gfid=0xd9e9a2300e464abe810113910ed25f87

[root@rhs-client19 ~]# getfattr -d -m . -e hex /rhs/brick6/afr1x2_tier_hot/new/one.txt 
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick6/afr1x2_tier_hot/new/one.txt
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.afr1x2_tier-client-3=0x000000010000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x03000000000000005677f3ad0007f386
trusted.gfid=0xd9e9a2300e464abe810113910ed25f87
trusted.tier.tier-dht.linkto=0x6166723178325f746965722d636f6c642d64687400

Mount
=======

[root@vertigo new]# cat one.txt 
When all are up and running
Brick from 18 down & file is on hot tier
Brick from 19 down & file is on hot tier
Brick from 19 down & file is on hot tier new one  new
Comment 2 RajeshReddy 2015-12-21 09:26:47 EST
sosreport is available @ /home/repo/sosreports/bug.1293349 on rhsqe-repo.lab.eng.blr.redhat.com
Comment 4 RajeshReddy 2015-12-28 02:01:25 EST
Yes file on hot tier in split-brain state, On one node  cold tier contains the actual file and another node contains link file 



[root@rhs-client18 ~]# gluster vol heal afr1x2_tier info split-brain
Brick rhs-client18.lab.eng.blr.redhat.com:/rhs/brick6/afr1x2_tier_hot
<gfid:d9e9a230-0e46-4abe-8101-13910ed25f87>
Number of entries in split-brain: 1

Brick rhs-client19.lab.eng.blr.redhat.com:/rhs/brick6/afr1x2_tier_hot
<gfid:d9e9a230-0e46-4abe-8101-13910ed25f87>
Number of entries in split-brain: 1

Brick rhs-client19.lab.eng.blr.redhat.com:/rhs/brick7/afr1x2_tier_cold
Number of entries in split-brain: 0

Brick rhs-client18.lab.eng.blr.redhat.com:/rhs/brick7/afr1x2_tier_cold
Number of entries in split-brain: 0
Comment 6 RajeshReddy 2015-12-28 03:51:47 EST
gluster vol heal afr1x2_tier info split-brain shows file in split-brain and more over this file is not getting promoted 

In replica volume user expects both nodes should contain the same data  in this case two nodes not having same data
Comment 7 Pranith Kumar K 2015-12-28 04:48:19 EST
Rajesh,
     I think we are on the same page. File that is shown in split-brain is a link file not the data file. And yes the file won't get promoted until the split-brain is resolved on hot-tier. I didn't understand "In replica volume user expects both nodes should contain the same data  in this case two nodes not having same data" Are you saying the two bricks which are in replication don't have same data?

Pranith
Comment 8 RajeshReddy 2015-12-28 05:10:00 EST
Earlier i was seeing differences (one brick contains actual file and another one contains link file) between two bricks which are in replication but now both bricks having same data
Comment 9 RajeshReddy 2015-12-29 02:23:42 EST
Once file is split-brain promotions will fail and is expected and here zero size files are in split brain so afr can ignore these files while checking for split-brain 

[root@rhs-client18 tier]# gluster vol heal afr1x2_tier info split-brain
Brick rhs-client18.lab.eng.blr.redhat.com:/rhs/brick6/afr1x2_tier_hot
<gfid:d9e9a230-0e46-4abe-8101-13910ed25f87>
Number of entries in split-brain: 1

Brick rhs-client19.lab.eng.blr.redhat.com:/rhs/brick6/afr1x2_tier_hot
<gfid:d9e9a230-0e46-4abe-8101-13910ed25f87>
Number of entries in split-brain: 1

Brick rhs-client19.lab.eng.blr.redhat.com:/rhs/brick7/afr1x2_tier_cold
Number of entries in split-brain: 0

Brick rhs-client18.lab.eng.blr.redhat.com:/rhs/brick7/afr1x2_tier_cold
Number of entries in split-brain: 0

[root@rhs-client18 tier]# cd /rhs/brick6/afr1x2_tier_hot
[root@rhs-client18 afr1x2_tier_hot]# ls
big  new  split  test
[root@rhs-client18 afr1x2_tier_hot]# cd new/
[root@rhs-client18 new]# ls -lrth
total 4.0K
---------T. 2 root root 0 Dec 21 18:18 one.txt
[root@rhs-client18 new]# getfattr -d -m . -e hex one.txt 
# file: one.txt
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.afr1x2_tier-client-2=0x000000010000000000000000
trusted.afr.dirty=0x000000030000000000000000
trusted.gfid=0xd9e9a2300e464abe810113910ed25f87
trusted.tier.tier-dht.linkto=0x6166723178325f746965722d636f6c642d64687400


[root@rhs-client19 ~]# cd /rhs/brick6/afr1x2_tier_hot/new
[root@rhs-client19 new]# getfattr -d -m . -e hex one.txt
# file: one.txt
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.afr1x2_tier-client-3=0x000000010000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x03000000000000005677f3ad0007f386
trusted.gfid=0xd9e9a2300e464abe810113910ed25f87
trusted.tier.tier-dht.linkto=0x6166723178325f746965722d636f6c642d64687400
Comment 10 Pranith Kumar K 2015-12-29 02:30:01 EST
We can take that as enhancement, where if the files are in data split-brain and both the files have zero size, it will remove split-brain automatically. Could you change the bug description to reflect the same?

Pranith
Comment 14 Ravishankar N 2017-09-14 08:13:45 EDT
Upstream patch https://review.gluster.org/#/c/18283
Comment 15 Ravishankar N 2017-10-11 06:00:26 EDT
(In reply to Ravishankar N from comment #14)
> Upstream patch https://review.gluster.org/#/c/18283


There is also a follow-up patch: https://review.gluster.org/#/c/18391/ (so 2 patches in total for this bug). Note that the fixes were sent as part of fixing BZ 1482812

Note You need to log in before you can comment on or make changes to this bug.