Bug 1716360 - Arbiter becoming source of heal when bricks are brought down continuously
Summary: Arbiter becoming source of heal when bricks are brought down continuously
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: arbiter
Version: rhgs-3.4
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Karthik U S
QA Contact: Prasanth
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-03 10:22 UTC by Anees Patel
Modified: 2022-03-22 04:48 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-31 05:14:42 UTC
Embargoed:


Attachments (Terms of Use)
Script to continously append to a file (397 bytes, application/x-perl)
2019-06-03 10:22 UTC, Anees Patel
no flags Details

Description Anees Patel 2019-06-03 10:22:36 UTC
Created attachment 1576564 [details]
Script to continously append to a file

Description of problem:

When 2 bricks in an arbiter volume are brought down continuously,  fattr's of both data bricks blame each other and arbiter is becoming source of heal, A similar issue was discovered earlier in BZ#1401969 which was fixed in 3.4.0

Version-Release number of selected component (if applicable):

Was discovered while testing Hotfix
# rpm -qa | grep gluster
python2-gluster-3.12.2-40.el7rhgs.1.HOTFIX.sfdc02320997.bz1708121.x86_64
glusterfs-3.12.2-40.el7rhgs.1.HOTFIX.sfdc02320997.bz1708121.x86_64

How reproducible:
Once

Steps to Reproduce:
1. Run a script that collects all the bricks in the volume, kill 2 bricks (b0, b1) with milli-second difference, bring back bricks using glusterd restart
2. Now kill b1 and b2 and repeat the cycle in loop
3. At the same time run the perl script on fuse client as IO, the script is attached along with this bug, this script opens a file and does infinite writes in loop 

Actual results:

File is pending heal and is unable to access from mount point. Arbiter becoming source of heal
# ls 1
ls: cannot access 1: Transport endpoint is not connected
# stat 1
stat: cannot stat ‘1’: Transport endpoint is not connected


# gluster v heal master2vol-2 info
Brick 10.70.36.49:/bricks/brick1/master1vol-2
<gfid:90959a41-63dc-4fe0-b6d9-f1223b1ab40f>
Status: Connected
Number of entries: 1

Brick 10.70.36.62:/bricks/brick3/master1vol-2repl
<gfid:90959a41-63dc-4fe0-b6d9-f1223b1ab40f>
Status: Connected
Number of entries: 1

Brick 10.70.36.56:/bricks/brick1/master1vol-2
<gfid:90959a41-63dc-4fe0-b6d9-f1223b1ab40f>
Status: Connected
Number of entries: 1

Expected results:

Arbiter brick should not become source of heal, and all files should heal

Additional info:
================
# gluster v info master2vol-2
 
Volume Name: master2vol-2
Type: Replicate
Volume ID: 0f62e637-15ae-4c64-828b-f7d83e08baf4
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.36.49:/bricks/brick1/master1vol-2
Brick2: 10.70.36.62:/bricks/brick3/master1vol-2repl
Brick3: 10.70.36.56:/bricks/brick1/master1vol-2 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
cluster.shd-max-threads: 30
cluster.enable-shared-storage: enable

=============================================================================
Extended Attributes for the file blame each other (client 0 and client 1) and dirty attribute is set.
Data-brick 1
# getfattr -m . -d -e hex /bricks/brick1/master1vol-2/replace-brick/1
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/master1vol-2/replace-brick/1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x0000002d0000000000000000
trusted.afr.master2vol-2-client-1=0x000000020000000000000000
trusted.gfid=0x90959a4163dc4fe0b6d9f1223b1ab40f
trusted.gfid2path.d6e66232a352f62e=0x31363365353336322d393862312d343836652d393061392d3437313437633165306662302f31
trusted.glusterfs.0f62e637-15ae-4c64-828b-f7d83e08baf4.xtime=0x5cf343a9000264bd
==
Data-brick 2
# getfattr -m . -d -e hex /bricks/brick3/master1vol-2repl/replace-brick/1
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick3/master1vol-2repl/replace-brick/1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x0000002c0000000000000000
trusted.afr.master2vol-2-client-0=0x000000010000000000000000
trusted.gfid=0x90959a4163dc4fe0b6d9f1223b1ab40f
trusted.gfid2path.d6e66232a352f62e=0x31363365353336322d393862312d343836652d393061392d3437313437633165306662302f31
trusted.glusterfs.0f62e637-15ae-4c64-828b-f7d83e08baf4.xtime=0x5cf343ae0005e2f7
==
Arbiter brick
# getfattr -m . -d -e hex /bricks/brick1/master1vol-2/replace-brick/1
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/master1vol-2/replace-brick/1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x0000002c0000000000000000
trusted.afr.master2vol-2-client-0=0x000000010000000000000000
trusted.afr.master2vol-2-client-1=0x000000020000000000000000
trusted.gfid=0x90959a4163dc4fe0b6d9f1223b1ab40f
trusted.gfid2path.d6e66232a352f62e=0x31363365353336322d393862312d343836652d393061392d3437313437633165306662302f31
trusted.glusterfs.0f62e637-15ae-4c64-828b-f7d83e08baf4.xtime=0x5cf343ac000e9301
=============================================================================
System details and sos-report to be provide in following comment


Note You need to log in before you can comment on or make changes to this bug.