Bug 1610743 - Directory is incorrectly reported as in split-brain when dirty marking is there
Summary: Directory is incorrectly reported as in split-brain when dirty marking is there
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: RHGS 3.4.z Batch Update 1
Assignee: Ravishankar N
QA Contact: Vijay Avuthu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-01 11:19 UTC by Vijay Avuthu
Modified: 2018-10-31 08:47 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.12.2-20
Doc Type: Bug Fix
Doc Text:
Previously, when directories had dirty markers set on them due to afr transaction failures or when replace brick/reset brick was performed, heal-info reporting considered them to be in split-brain state. With this fix, heal-info does not consider the presence of dirty markers as an indication of split-brain and does not display these entries to be in split-brain state.
Clone Of:
Environment:
Last Closed: 2018-10-31 08:46:14 UTC


Attachments (Terms of Use)
gluster-health-report (4.70 KB, text/plain)
2018-08-01 11:19 UTC, Vijay Avuthu
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:3432 None None None 2018-10-31 08:47:58 UTC

Description Vijay Avuthu 2018-08-01 11:19:35 UTC
Created attachment 1472059 [details]
gluster-health-report

Description of problem:

split-brain observed on parent dir while verifying  bug 1566336


Version-Release number of selected component (if applicable):

Build used: glusterfs-3.12.2-15.el7rhgs.x86_64



How reproducible: Always


Steps to Reproduce:

1) create 1 * 3 volume and start
2) Disable all client side heals and create dir from client
3) Fill the 2 bricks from back-end ( b1 and b2 )
4) From mount point, create the file inside dir and it should fail with "No Space" but the name entry is created on b0.
5) check the heal info and it should list the above file
6) check the change logs of dir ( parent ) and dirty bit should be set.
7) make space in b1 and b2 by removing previously created files from backend
8) trigger heal and the file which was created in step 4 should be healed.
9) dirty bit should be cleared from dir.

Actual results:

At step 5, observed split-brain on parent dir

Expected results:

parent dir shouldn't be in split-brain

Additional info:

5)
# gluster vol heal 13 info 
Brick rhsauto025.lab.eng.blr.redhat.com:/bricks/brick0/b0
/test/test1 
/test - Is in split-brain

Status: Connected
Number of entries: 2

Brick rhsauto024.lab.eng.blr.redhat.com:/bricks/brick0/b1
Status: Connected
Number of entries: 0

Brick rhsauto026.lab.eng.blr.redhat.com:/bricks/brick0/b2
Status: Connected
Number of entries: 0
# 

6) 
# getfattr -d -m . -e hex /bricks/brick0/b0/test/
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/b0/test/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000001
trusted.gfid=0x76b00d4743754753a99f8b5f74f2f6bd
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.dht.mds=0x00000000

# getfattr -d -m . -e hex /bricks/brick0/b0/test/test1 
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/b0/test/test1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.13-client-1=0x000000010000000100000000
trusted.afr.13-client-2=0x000000010000000100000000
trusted.gfid=0xfa4dad2a2e3e47ada319c5bc2ca9e2b1
trusted.gfid2path.8c8c0ebfbd194a1f=0x37366230306434372d343337352d343735332d613939662d3862356637346632663662642f7465737431
#

> 
# gluster vol info 13
 
Volume Name: 13
Type: Replicate
Volume ID: 620301ee-9a31-4320-85cd-1beedcd93cdf
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: rhsauto025.lab.eng.blr.redhat.com:/bricks/brick0/b0
Brick2: rhsauto024.lab.eng.blr.redhat.com:/bricks/brick0/b1
Brick3: rhsauto026.lab.eng.blr.redhat.com:/bricks/brick0/b2
Options Reconfigured:
cluster.entry-self-heal: off
cluster.metadata-self-heal: off
cluster.data-self-heal: off
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
# 


SOS Report:

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/vavuthu/split_brain_on_bricks_full/

Comment 5 Ravishankar N 2018-09-10 09:21:20 UTC
Attempted an upstream fix via https://review.gluster.org/21135  (BZ 1626994).

Comment 10 Anees Patel 2018-10-05 12:57:27 UTC
Verified the fix, see below.

Build used glusterfs-3.12.2-21.el7rhgs.x86_64

At Step 5 from Bug Description, No split-brain is reported by heal info

# gluster vol heal replicate_bug info
Brick 10.70.47.133:/bricks/brick3/day4
Status: Connected
Number of entries: 0

Brick 10.70.46.168:/bricks/brick3/day4
/dir1/300mbfile 
/dir1 
Status: Connected
Number of entries: 2

Brick 10.70.47.102:/bricks/brick3/day4
Status: Connected
Number of entries: 0

Also we can see dirty bit at step 6, which is as expected.

# getfattr -d -m . -e hex /bricks/brick3/day4/dir1
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick3/day4/dir1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000001
trusted.gfid=0xbd6c8c8584b7476c9eba9c8d128e5765
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.dht.mds=0x00000000

getfattr -d -m . -e hex /bricks/brick3/day4/dir1/300mbfile 
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick3/day4/dir1/300mbfile
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.replicate_bug-client-0=0x000000010000000100000000
trusted.afr.replicate_bug-client-2=0x000000010000000100000000
trusted.gfid=0x1c851d3ae1df4abb93f204761a156d03
trusted.gfid2path.a63e0b78c611e2f2=0x62643663386338352d383462372d343736632d396562612d3963386431323865353736352f3330306d6266696c65

Moving it to verified

Comment 12 Ravishankar N 2018-10-11 05:38:28 UTC
Looks good to me.

Comment 14 errata-xmlrpc 2018-10-31 08:46:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3432


Note You need to log in before you can comment on or make changes to this bug.