Bug 1610743

Summary: Directory is incorrectly reported as in split-brain when dirty marking is there
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vijay Avuthu <vavuthu>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED ERRATA QA Contact: Vijay Avuthu <vavuthu>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: anepatel, apaladug, chpai, ravishankar, rhs-bugs, sanandpa, sankarshan, sheggodu, storage-qa-internal, vdas
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.4.z Batch Update 1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-20 Doc Type: Bug Fix
Doc Text:
Previously, when directories had dirty markers set on them due to afr transaction failures or when replace brick/reset brick was performed, heal-info reporting considered them to be in split-brain state. With this fix, heal-info does not consider the presence of dirty markers as an indication of split-brain and does not display these entries to be in split-brain state.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-31 08:46:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
gluster-health-report none

Description Vijay Avuthu 2018-08-01 11:19:35 UTC
Created attachment 1472059 [details]
gluster-health-report

Description of problem:

split-brain observed on parent dir while verifying  bug 1566336


Version-Release number of selected component (if applicable):

Build used: glusterfs-3.12.2-15.el7rhgs.x86_64



How reproducible: Always


Steps to Reproduce:

1) create 1 * 3 volume and start
2) Disable all client side heals and create dir from client
3) Fill the 2 bricks from back-end ( b1 and b2 )
4) From mount point, create the file inside dir and it should fail with "No Space" but the name entry is created on b0.
5) check the heal info and it should list the above file
6) check the change logs of dir ( parent ) and dirty bit should be set.
7) make space in b1 and b2 by removing previously created files from backend
8) trigger heal and the file which was created in step 4 should be healed.
9) dirty bit should be cleared from dir.

Actual results:

At step 5, observed split-brain on parent dir

Expected results:

parent dir shouldn't be in split-brain

Additional info:

5)
# gluster vol heal 13 info 
Brick rhsauto025.lab.eng.blr.redhat.com:/bricks/brick0/b0
/test/test1 
/test - Is in split-brain

Status: Connected
Number of entries: 2

Brick rhsauto024.lab.eng.blr.redhat.com:/bricks/brick0/b1
Status: Connected
Number of entries: 0

Brick rhsauto026.lab.eng.blr.redhat.com:/bricks/brick0/b2
Status: Connected
Number of entries: 0
# 

6) 
# getfattr -d -m . -e hex /bricks/brick0/b0/test/
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/b0/test/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000001
trusted.gfid=0x76b00d4743754753a99f8b5f74f2f6bd
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.dht.mds=0x00000000

# getfattr -d -m . -e hex /bricks/brick0/b0/test/test1 
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/b0/test/test1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.13-client-1=0x000000010000000100000000
trusted.afr.13-client-2=0x000000010000000100000000
trusted.gfid=0xfa4dad2a2e3e47ada319c5bc2ca9e2b1
trusted.gfid2path.8c8c0ebfbd194a1f=0x37366230306434372d343337352d343735332d613939662d3862356637346632663662642f7465737431
#

> 
# gluster vol info 13
 
Volume Name: 13
Type: Replicate
Volume ID: 620301ee-9a31-4320-85cd-1beedcd93cdf
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: rhsauto025.lab.eng.blr.redhat.com:/bricks/brick0/b0
Brick2: rhsauto024.lab.eng.blr.redhat.com:/bricks/brick0/b1
Brick3: rhsauto026.lab.eng.blr.redhat.com:/bricks/brick0/b2
Options Reconfigured:
cluster.entry-self-heal: off
cluster.metadata-self-heal: off
cluster.data-self-heal: off
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
# 


SOS Report:

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/vavuthu/split_brain_on_bricks_full/

Comment 5 Ravishankar N 2018-09-10 09:21:20 UTC
Attempted an upstream fix via https://review.gluster.org/21135  (BZ 1626994).

Comment 10 Anees Patel 2018-10-05 12:57:27 UTC
Verified the fix, see below.

Build used glusterfs-3.12.2-21.el7rhgs.x86_64

At Step 5 from Bug Description, No split-brain is reported by heal info

# gluster vol heal replicate_bug info
Brick 10.70.47.133:/bricks/brick3/day4
Status: Connected
Number of entries: 0

Brick 10.70.46.168:/bricks/brick3/day4
/dir1/300mbfile 
/dir1 
Status: Connected
Number of entries: 2

Brick 10.70.47.102:/bricks/brick3/day4
Status: Connected
Number of entries: 0

Also we can see dirty bit at step 6, which is as expected.

# getfattr -d -m . -e hex /bricks/brick3/day4/dir1
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick3/day4/dir1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000001
trusted.gfid=0xbd6c8c8584b7476c9eba9c8d128e5765
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.dht.mds=0x00000000

getfattr -d -m . -e hex /bricks/brick3/day4/dir1/300mbfile 
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick3/day4/dir1/300mbfile
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.replicate_bug-client-0=0x000000010000000100000000
trusted.afr.replicate_bug-client-2=0x000000010000000100000000
trusted.gfid=0x1c851d3ae1df4abb93f204761a156d03
trusted.gfid2path.a63e0b78c611e2f2=0x62643663386338352d383462372d343736632d396562612d3963386431323865353736352f3330306d6266696c65

Moving it to verified

Comment 12 Ravishankar N 2018-10-11 05:38:28 UTC
Looks good to me.

Comment 14 errata-xmlrpc 2018-10-31 08:46:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3432