Bug 1645480

Summary: Files pending heal in Arbiter volume
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Anees Patel <anepatel>
Component: arbiterAssignee: Karthik U S <ksubrahm>
Status: CLOSED ERRATA QA Contact: Anees Patel <anepatel>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.4CC: amukherj, anepatel, apaladug, bkunal, ksubrahm, nchilaka, rcyriac, rhs-bugs, sanandpa, sankarshan, sheggodu, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.4.z Batch Update 3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-33 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-04 07:41:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1362129, 1655854, 1657783    
Bug Blocks:    

Description Anees Patel 2018-11-02 10:39:37 UTC
Description of problem:

Hit a issue while testing for replica 3 to arbiter conversion, Heal is not completing for Arbiter volume 1 X (2+1) when rename operation is done when brick down.

Version-Release number of selected component (if applicable):

# rpm -qa | grep gluster
glusterfs-3.12.2-25.el7rhgs.x86_64
tendrl-gluster-integration-1.5.4-14.el7rhgs.noarch
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-server-3.12.2-25.el7rhgs.x86_64


How reproducible:

Most of the time (3 out of 4 times)

Steps to Reproduce:
1. Disable all client side heals
2. Created a file from client (fuse mount) when all three bricks are up.
# echo "Hi" >>retry1
3. Bring down brick 2 and append the file
# echo "Hi2" >>retry1
4. Bring down brick 1 and bring up brick 2
Now cat on file fails,
# cat retry1
cat: retry1: Input/output error
5. Perform rename
# mv retry1 retry2
# ls
retry2
6. After rename, cat works on file retry2 which previously was giving I/o error 
# cat retry2 
Hi
7. Append file retry2
# echo "W" >>retry2
Append works this time
8. Now bring all bricks up, issue heal
9. # cat retry2 
cat: retry2: Input/output error
10. Also heal info has this file retry2 as pending heal

A similar bug was raised upstream BZ Bug 1357000
	
Actual results:
Tried this scenario four time, three times it hit the issue.
# cat retry1
cat: retry1: No such file or directory
# cat file2
cat: file2: Input/output error
# cat nile2
cat: nile2: Input/output error

Heal is pending for the renamed file, 
# gluster v heal newvol info
Brick 10.70.47.130:/bricks/brick1/newvol
Status: Connected
Number of entries: 0

Brick 10.70.46.213:/bricks/brick1/newvol
<gfid:b9291fe7-06ae-4fea-b492-90882e91c299>/file2 
<gfid:b9291fe7-06ae-4fea-b492-90882e91c299>/nile2 
<gfid:b9291fe7-06ae-4fea-b492-90882e91c299>/retry2 
Status: Connected
Number of entries: 3

Brick 10.70.47.38:/bricks/brick1/newvol1
/dir2/file2 
/dir2/nile2 
/dir2/retry2 
Status: Connected
Number of entries: 3

At step 9 unable to read file, hence seems to be a Data Unavailable issue,

Expected results:
At step 7 append should not be allowed as the brick containing good copy is down, also heal should complete, with no files pending heal

Additional info:

xttr of the entries

Brick1
# getfattr -d -m . -e hex /bricks/brick1/newvol/dir2/retry2
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/newvol/dir2/retry2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.gfid=0x5e9646bf976c4cd8832c1da9b10c730f
trusted.gfid2path.3cf2e0799590a8e0=0x62393239316665372d303661652d346665612d623439322d3930383832653931633239392f726574727932

Brick2 
# getfattr -d -m . -e hex /bricks/brick1/newvol/dir2/retry2 
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/newvol/dir2/retry2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.newvol-client-0=0x000000010000000000000000
trusted.gfid=0x5e9646bf976c4cd8832c1da9b10c730f
trusted.gfid2path.3cf2e0799590a8e0=0x62393239316665372d303661652d346665612d623439322d3930383832653931633239392f726574727932

Brick3 / arbiter

# getfattr -d -m . -e hex /bricks/brick1/newvol1/dir2/retry2
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/newvol1/dir2/retry2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.newvol-client-0=0x000000010000000000000000
trusted.afr.newvol-client-1=0x000000010000000000000000
trusted.gfid=0x5e9646bf976c4cd8832c1da9b10c730f
trusted.gfid2path.3cf2e0799590a8e0=0x62393239316665372d303661652d346665612d623439322d3930383832653931633239392f726574727932


Client logs:

[2018-11-02 09:56:14.772374] W [fuse-bridge.c:871:fuse_attr_cbk] 0-glusterfs-fuse: 2218: STAT() /dir2 => -1 (Input/output error)
[2018-11-02 09:56:15.961426] I [rpc-clnt.c:2007:rpc_clnt_reconfig] 4-newvol-client-1: changing port to 49162 (from 0)
[2018-11-02 09:56:15.967799] I [MSGID: 114057] [client-handshake.c:1397:select_server_supported_programs] 4-newvol-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2018-11-02 09:56:15.972517] I [MSGID: 114046] [client-handshake.c:1150:client_setvolume_cbk] 4-newvol-client-1: Connected to newvol-client-1, attached to remote volume '/bricks/brick1/newvol'.
[2018-11-02 09:56:15.972553] I [MSGID: 114047] [client-handshake.c:1161:client_setvolume_cbk] 4-newvol-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2018-11-02 09:56:15.972704] I [MSGID: 108002] [afr-common.c:5164:afr_notify] 4-newvol-replicate-0: Client-quorum is met
[2018-11-02 09:56:15.973448] I [MSGID: 114035] [client-handshake.c:121:client_set_lk_version_cbk] 4-newvol-client-1: Server lk version = 1
[2018-11-02 09:56:42.620808] W [fuse-bridge.c:1396:fuse_err_cbk] 0-glusterfs-fuse: 2285: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2018-11-02 09:57:13.664335] I [rpc-clnt.c:2007:rpc_clnt_reconfig] 4-newvol-client-0: changing port to 49158 (from 0)
[2018-11-02 09:57:13.669657] I [MSGID: 114057] [client-handshake.c:1397:select_server_supported_programs] 4-newvol-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2018-11-02 09:57:13.670849] I [MSGID: 114046] [client-handshake.c:1150:client_setvolume_cbk] 4-newvol-client-0: Connected to newvol-client-0, attached to remote volume '/bricks/brick1/newvol'.
[2018-11-02 09:57:13.670888] I [MSGID: 114047] [client-handshake.c:1161:client_setvolume_cbk] 4-newvol-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2018-11-02 09:57:13.671372] I [MSGID: 114035] [client-handshake.c:121:client_set_lk_version_cbk] 4-newvol-client-0: Server lk version = 1
[2018-11-02 09:57:32.177276] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 4-newvol-replicate-0: Failing STAT on gfid 5e9646bf-976c-4cd8-832c-1da9b10c730f: split-brain observed. [Input/output e
rror]
[2018-11-02 09:57:32.177385] W [fuse-bridge.c:871:fuse_attr_cbk] 0-glusterfs-fuse: 2294: STAT() /dir2/retry2 => -1 (Input/output error)
[2018-11-02 10:16:43.798183] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 4-newvol-replicate-0: Failing READ on gfid 5e9646bf-976c-4cd8-832c-1da9b10c730f: split-brain observed. [Input/output e
rror]
[2018-11-02 10:16:43.798368] W [fuse-bridge.c:2337:fuse_readv_cbk] 0-glusterfs-fuse: 2404: READ => -1 gfid=5e9646bf-976c-4cd8-832c-1da9b10c730f fd=0x7f552000f3b0 (Input/output error)
[2018-11-02 10:17:07.922620] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 4-newvol-replicate-0: Failing READ on gfid 53ba229a-a933-48a9-a7c9-2abd99c1e557: split-brain observed. [Input/output e
rror]
[2018-11-02 10:17:07.922731] W [fuse-bridge.c:2337:fuse_readv_cbk] 0-glusterfs-fuse: 2417: READ => -1 gfid=53ba229a-a933-48a9-a7c9-2abd99c1e557 fd=0x7f552002ac00 (Input/output error)
[2018-11-02 10:19:49.578790] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 4-newvol-replicate-0: Failing READ on gfid f20e1de8-c29f-4bd9-9d2b-c9c56af6ce33: split-brain observed. [Input/output error]
[2018-11-02 10:19:49.578953] W [fuse-bridge.c:2337:fuse_readv_cbk] 0-glusterfs-fuse: 2433: READ => -1 gfid=f20e1de8-c29f-4bd9-9d2b-c9c56af6ce33 fd=0x7f552001ad20 (Input/output error)

Volume Info:
# gluster v info newvol
 
Volume Name: newvol
Type: Replicate
Volume ID: 05cd2dec-7a61-4eb0-88cd-b5b829bed17f
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.47.130:/bricks/brick1/newvol
Brick2: 10.70.46.213:/bricks/brick1/newvol
Brick3: 10.70.47.38:/bricks/brick1/newvol1 (arbiter)
Options Reconfigured:
cluster.entry-self-heal: off
cluster.data-self-heal: off
cluster.metadata-self-heal: off
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

Comment 2 Anees Patel 2018-11-02 10:45:05 UTC
*** Bug 1645482 has been marked as a duplicate of this bug. ***

Comment 3 Karthik U S 2018-11-09 13:23:45 UTC
I tried this locally, and I got two issues.

1. As pointed by Anees, arbiter becoming source according to the xattrs set. This is because of the new entry marking. When the entry got healed from source brick, in this case the 2nd & 3rd bricks are the sources since that had the renamed file and the 1st brick was sink. Heal happened from 2nd brick to the 1st brick. During this the file got recreated on the sink brick and new entry marking will happen on the source bricks. Now on brick 3 i.e., arbiter brick we have data pending attributes set for both the data brick (for brick 2 as part of  step 3 in the description and for brick 1 as part of the new entry marking during entry heal).
This issue is the duplicate of BZ #1340032
Step 6 and 7 would have succeeded because of caching. Can you give the client log, to verify that step 7 is actually succeeded or not. You should see a write failed with error EIO in the logs for step 7.

2. Sometimes in step 8, after heal is completed and even though the entry on brick 1 gets recreated as part of entry heal it is having data pending attribute on brick 2.

# file: home/kus/gbricks/br1/retry2
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a757365725f686f6d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gv1-client-1=0x000000010000000000000000
trusted.gfid=0x2d22d2e8ef714deebd78519fa072538e
trusted.gfid2path.026991ed7ee7606b=0x30303030303030302d303030302d303030302d303030302d30303030303030303030
30312f726574727931
trusted.gfid2path.1bba90641876e471=0x30303030303030302d303030302d303030302d303030302d30303030303030303030
30312f726574727932

# file: home/kus/gbricks/br2/retry2
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a757365725f686f6d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gv1-client-0=0x000000010000000000000000
trusted.gfid=0x2d22d2e8ef714deebd78519fa072538e
trusted.gfid2path.1bba90641876e471=0x30303030303030302d303030302d303030302d303030302d30303030303030303030
30312f726574727932

# file: home/kus/gbricks/br3/retry2
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a757365725f686f6d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gv1-client-0=0x000000010000000000000000
trusted.afr.gv1-client-1=0x000000000000000000000000
trusted.gfid=0x2d22d2e8ef714deebd78519fa072538e 


SHD log snippet:
[2018-11-09 13:04:34.543762] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gv1-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001
[2018-11-09 13:04:34.553159] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gv1-replicate-0: expunging file 00000000-0000-0000-0000-000000000001/retry1 (2d22d2e8-ef71-4dee-bd78-519fa072538e) on gv1-client-0
[2018-11-09 13:04:34.584284] I [MSGID: 108026] [afr-self-heal-common.c:1732:afr_log_selfheal] 0-gv1-replicate-0: Completed data selfheal on 2d22d2e8-ef71-4dee-bd78-519fa072538e. sources=[0] 2  sinks=1 
[2018-11-09 13:04:34.584719] W [MSGID: 114031] [client-rpc-fops.c:2865:client3_3_lookup_cbk] 0-gv1-client-0: remote operation failed. Path: <gfid:2d22d2e8-ef71-4dee-bd78-519fa072538e> (2d22d2e8-ef71-4dee-bd78-519fa072538e) [No such file or directory]
[2018-11-09 13:04:34.587619] I [MSGID: 108026] [afr-self-heal-common.c:1732:afr_log_selfheal] 0-gv1-replicate-0: Completed entry selfheal on 00000000-0000-0000-0000-000000000001. sources=[1] 2  sinks=0          
[2018-11-09 13:04:35.609488] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gv1-replicate-0: performing metadata selfheal on 2d22d2e8-ef71-4dee-bd78-519fa072538e
[2018-11-09 13:04:35.611987] I [MSGID: 108026] [afr-self-heal-common.c:1732:afr_log_selfheal] 0-gv1-replicate-0: Completed metadata selfheal on 2d22d2e8-ef71-4dee-bd78-519fa072538e. sources=[1] 2  sinks=0


In both these cases the heal will not proceed as arbiter is the only source for heal, but it can not be a source for data heal. I have to investigate further to see why issue 2 is happening.

Adding need info for the client log to verify step 7 did not give error because of write-behind.

Comment 5 Anees Patel 2018-11-17 12:27:07 UTC
getfattr for the above file "retry2"

brick1:
# getfattr -d -m . -e hex /var/lib/heketi/mounts/vg_122ea358da894515bd3c3076cc136006/brick_a3e16a8004a3cb29edafab859d182fc0/brick/retry2 
getfattr: Removing leading '/' from absolute path names
# file: var/lib/heketi/mounts/vg_122ea358da894515bd3c3076cc136006/brick_a3e16a8004a3cb29edafab859d182fc0/brick/retry2
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.vol_a00f012d78a5a167acc5abc66795ef95-client-1=0x000000010000000000000000
trusted.gfid=0x7c0c51d1c6c7455d8e408877c197996d
trusted.gfid2path.1bba90641876e471=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f726574727932

brick2:
# getfattr -d -m . -e hex /var/lib/heketi/mounts/vg_53cc243e52473fe12b459a5e46181e0c/brick_ee67805c4ba175f0a122b0374c15a130/brick/retry2
getfattr: Removing leading '/' from absolute path names
# file: var/lib/heketi/mounts/vg_53cc243e52473fe12b459a5e46181e0c/brick_ee67805c4ba175f0a122b0374c15a130/brick/retry2
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.vol_a00f012d78a5a167acc5abc66795ef95-client-0=0x000000010000000000000000
trusted.gfid=0x7c0c51d1c6c7455d8e408877c197996d
trusted.gfid2path.026991ed7ee7606b=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f726574727931
trusted.gfid2path.1bba90641876e471=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f726574727932


arbiter brick:
# getfattr -d -m . -e hex /var/lib/heketi/mounts/vg_34af297f6dca3c38b7a297db36acc230/brick_748ba54728b88270ab0dc0d960006aae/brick/retry2
getfattr: Removing leading '/' from absolute path names
# file: var/lib/heketi/mounts/vg_34af297f6dca3c38b7a297db36acc230/brick_748ba54728b88270ab0dc0d960006aae/brick/retry2
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.vol_a00f012d78a5a167acc5abc66795ef95-client-0=0x000000000000000000000000
trusted.afr.vol_a00f012d78a5a167acc5abc66795ef95-client-1=0x000000010000000000000000
trusted.gfid=0x7c0c51d1c6c7455d8e408877c197996d
trusted.gfid2path.1bba90641876e471=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f726574727932

Comment 6 Anees Patel 2018-11-20 10:56:43 UTC
The issue is reproducible on plain arbiter volume also.

Tested on the latest build
# rpm -qa | grep gluster
glusterfs-libs-3.12.2-27.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-27.el7rhgs.x86_64
python2-gluster-3.12.2-27.el7rhgs.x86_64
glusterfs-events-3.12.2-27.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-27.el7rhgs.x86_64

Comment 14 Anand Paladugu 2018-11-29 03:02:45 UTC
Do we know how many customer have reported the issue so far ?

Comment 27 Anees Patel 2019-01-07 13:48:14 UTC
Verified the fix per the above test-plan,
Verified on build:
# rpm -qa | grep gluster
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-libs-3.12.2-36.el7rhgs.x86_64
glusterfs-events-3.12.2-36.el7rhgs.x86_64
glusterfs-fuse-3.12.2-36.el7rhgs.x86_64

Comment 29 errata-xmlrpc 2019-02-04 07:41:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0263