Description of problem: Hit a issue while testing for replica 3 to arbiter conversion, Heal is not completing for Arbiter volume 1 X (2+1) when rename operation is done when brick down. Version-Release number of selected component (if applicable): # rpm -qa | grep gluster glusterfs-3.12.2-25.el7rhgs.x86_64 tendrl-gluster-integration-1.5.4-14.el7rhgs.noarch gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 glusterfs-server-3.12.2-25.el7rhgs.x86_64 How reproducible: Most of the time (3 out of 4 times) Steps to Reproduce: 1. Disable all client side heals 2. Created a file from client (fuse mount) when all three bricks are up. # echo "Hi" >>retry1 3. Bring down brick 2 and append the file # echo "Hi2" >>retry1 4. Bring down brick 1 and bring up brick 2 Now cat on file fails, # cat retry1 cat: retry1: Input/output error 5. Perform rename # mv retry1 retry2 # ls retry2 6. After rename, cat works on file retry2 which previously was giving I/o error # cat retry2 Hi 7. Append file retry2 # echo "W" >>retry2 Append works this time 8. Now bring all bricks up, issue heal 9. # cat retry2 cat: retry2: Input/output error 10. Also heal info has this file retry2 as pending heal A similar bug was raised upstream BZ Bug 1357000 Actual results: Tried this scenario four time, three times it hit the issue. # cat retry1 cat: retry1: No such file or directory # cat file2 cat: file2: Input/output error # cat nile2 cat: nile2: Input/output error Heal is pending for the renamed file, # gluster v heal newvol info Brick 10.70.47.130:/bricks/brick1/newvol Status: Connected Number of entries: 0 Brick 10.70.46.213:/bricks/brick1/newvol <gfid:b9291fe7-06ae-4fea-b492-90882e91c299>/file2 <gfid:b9291fe7-06ae-4fea-b492-90882e91c299>/nile2 <gfid:b9291fe7-06ae-4fea-b492-90882e91c299>/retry2 Status: Connected Number of entries: 3 Brick 10.70.47.38:/bricks/brick1/newvol1 /dir2/file2 /dir2/nile2 /dir2/retry2 Status: Connected Number of entries: 3 At step 9 unable to read file, hence seems to be a Data Unavailable issue, Expected results: At step 7 append should not be allowed as the brick containing good copy is down, also heal should complete, with no files pending heal Additional info: xttr of the entries Brick1 # getfattr -d -m . -e hex /bricks/brick1/newvol/dir2/retry2 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/newvol/dir2/retry2 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0x5e9646bf976c4cd8832c1da9b10c730f trusted.gfid2path.3cf2e0799590a8e0=0x62393239316665372d303661652d346665612d623439322d3930383832653931633239392f726574727932 Brick2 # getfattr -d -m . -e hex /bricks/brick1/newvol/dir2/retry2 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/newvol/dir2/retry2 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.newvol-client-0=0x000000010000000000000000 trusted.gfid=0x5e9646bf976c4cd8832c1da9b10c730f trusted.gfid2path.3cf2e0799590a8e0=0x62393239316665372d303661652d346665612d623439322d3930383832653931633239392f726574727932 Brick3 / arbiter # getfattr -d -m . -e hex /bricks/brick1/newvol1/dir2/retry2 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/newvol1/dir2/retry2 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.newvol-client-0=0x000000010000000000000000 trusted.afr.newvol-client-1=0x000000010000000000000000 trusted.gfid=0x5e9646bf976c4cd8832c1da9b10c730f trusted.gfid2path.3cf2e0799590a8e0=0x62393239316665372d303661652d346665612d623439322d3930383832653931633239392f726574727932 Client logs: [2018-11-02 09:56:14.772374] W [fuse-bridge.c:871:fuse_attr_cbk] 0-glusterfs-fuse: 2218: STAT() /dir2 => -1 (Input/output error) [2018-11-02 09:56:15.961426] I [rpc-clnt.c:2007:rpc_clnt_reconfig] 4-newvol-client-1: changing port to 49162 (from 0) [2018-11-02 09:56:15.967799] I [MSGID: 114057] [client-handshake.c:1397:select_server_supported_programs] 4-newvol-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2018-11-02 09:56:15.972517] I [MSGID: 114046] [client-handshake.c:1150:client_setvolume_cbk] 4-newvol-client-1: Connected to newvol-client-1, attached to remote volume '/bricks/brick1/newvol'. [2018-11-02 09:56:15.972553] I [MSGID: 114047] [client-handshake.c:1161:client_setvolume_cbk] 4-newvol-client-1: Server and Client lk-version numbers are not same, reopening the fds [2018-11-02 09:56:15.972704] I [MSGID: 108002] [afr-common.c:5164:afr_notify] 4-newvol-replicate-0: Client-quorum is met [2018-11-02 09:56:15.973448] I [MSGID: 114035] [client-handshake.c:121:client_set_lk_version_cbk] 4-newvol-client-1: Server lk version = 1 [2018-11-02 09:56:42.620808] W [fuse-bridge.c:1396:fuse_err_cbk] 0-glusterfs-fuse: 2285: FLUSH() ERR => -1 (Transport endpoint is not connected) [2018-11-02 09:57:13.664335] I [rpc-clnt.c:2007:rpc_clnt_reconfig] 4-newvol-client-0: changing port to 49158 (from 0) [2018-11-02 09:57:13.669657] I [MSGID: 114057] [client-handshake.c:1397:select_server_supported_programs] 4-newvol-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2018-11-02 09:57:13.670849] I [MSGID: 114046] [client-handshake.c:1150:client_setvolume_cbk] 4-newvol-client-0: Connected to newvol-client-0, attached to remote volume '/bricks/brick1/newvol'. [2018-11-02 09:57:13.670888] I [MSGID: 114047] [client-handshake.c:1161:client_setvolume_cbk] 4-newvol-client-0: Server and Client lk-version numbers are not same, reopening the fds [2018-11-02 09:57:13.671372] I [MSGID: 114035] [client-handshake.c:121:client_set_lk_version_cbk] 4-newvol-client-0: Server lk version = 1 [2018-11-02 09:57:32.177276] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 4-newvol-replicate-0: Failing STAT on gfid 5e9646bf-976c-4cd8-832c-1da9b10c730f: split-brain observed. [Input/output e rror] [2018-11-02 09:57:32.177385] W [fuse-bridge.c:871:fuse_attr_cbk] 0-glusterfs-fuse: 2294: STAT() /dir2/retry2 => -1 (Input/output error) [2018-11-02 10:16:43.798183] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 4-newvol-replicate-0: Failing READ on gfid 5e9646bf-976c-4cd8-832c-1da9b10c730f: split-brain observed. [Input/output e rror] [2018-11-02 10:16:43.798368] W [fuse-bridge.c:2337:fuse_readv_cbk] 0-glusterfs-fuse: 2404: READ => -1 gfid=5e9646bf-976c-4cd8-832c-1da9b10c730f fd=0x7f552000f3b0 (Input/output error) [2018-11-02 10:17:07.922620] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 4-newvol-replicate-0: Failing READ on gfid 53ba229a-a933-48a9-a7c9-2abd99c1e557: split-brain observed. [Input/output e rror] [2018-11-02 10:17:07.922731] W [fuse-bridge.c:2337:fuse_readv_cbk] 0-glusterfs-fuse: 2417: READ => -1 gfid=53ba229a-a933-48a9-a7c9-2abd99c1e557 fd=0x7f552002ac00 (Input/output error) [2018-11-02 10:19:49.578790] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 4-newvol-replicate-0: Failing READ on gfid f20e1de8-c29f-4bd9-9d2b-c9c56af6ce33: split-brain observed. [Input/output error] [2018-11-02 10:19:49.578953] W [fuse-bridge.c:2337:fuse_readv_cbk] 0-glusterfs-fuse: 2433: READ => -1 gfid=f20e1de8-c29f-4bd9-9d2b-c9c56af6ce33 fd=0x7f552001ad20 (Input/output error) Volume Info: # gluster v info newvol Volume Name: newvol Type: Replicate Volume ID: 05cd2dec-7a61-4eb0-88cd-b5b829bed17f Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 10.70.47.130:/bricks/brick1/newvol Brick2: 10.70.46.213:/bricks/brick1/newvol Brick3: 10.70.47.38:/bricks/brick1/newvol1 (arbiter) Options Reconfigured: cluster.entry-self-heal: off cluster.data-self-heal: off cluster.metadata-self-heal: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off
*** Bug 1645482 has been marked as a duplicate of this bug. ***
I tried this locally, and I got two issues. 1. As pointed by Anees, arbiter becoming source according to the xattrs set. This is because of the new entry marking. When the entry got healed from source brick, in this case the 2nd & 3rd bricks are the sources since that had the renamed file and the 1st brick was sink. Heal happened from 2nd brick to the 1st brick. During this the file got recreated on the sink brick and new entry marking will happen on the source bricks. Now on brick 3 i.e., arbiter brick we have data pending attributes set for both the data brick (for brick 2 as part of step 3 in the description and for brick 1 as part of the new entry marking during entry heal). This issue is the duplicate of BZ #1340032 Step 6 and 7 would have succeeded because of caching. Can you give the client log, to verify that step 7 is actually succeeded or not. You should see a write failed with error EIO in the logs for step 7. 2. Sometimes in step 8, after heal is completed and even though the entry on brick 1 gets recreated as part of entry heal it is having data pending attribute on brick 2. # file: home/kus/gbricks/br1/retry2 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a757365725f686f6d655f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv1-client-1=0x000000010000000000000000 trusted.gfid=0x2d22d2e8ef714deebd78519fa072538e trusted.gfid2path.026991ed7ee7606b=0x30303030303030302d303030302d303030302d303030302d30303030303030303030 30312f726574727931 trusted.gfid2path.1bba90641876e471=0x30303030303030302d303030302d303030302d303030302d30303030303030303030 30312f726574727932 # file: home/kus/gbricks/br2/retry2 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a757365725f686f6d655f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv1-client-0=0x000000010000000000000000 trusted.gfid=0x2d22d2e8ef714deebd78519fa072538e trusted.gfid2path.1bba90641876e471=0x30303030303030302d303030302d303030302d303030302d30303030303030303030 30312f726574727932 # file: home/kus/gbricks/br3/retry2 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a757365725f686f6d655f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv1-client-0=0x000000010000000000000000 trusted.afr.gv1-client-1=0x000000000000000000000000 trusted.gfid=0x2d22d2e8ef714deebd78519fa072538e SHD log snippet: [2018-11-09 13:04:34.543762] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gv1-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001 [2018-11-09 13:04:34.553159] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gv1-replicate-0: expunging file 00000000-0000-0000-0000-000000000001/retry1 (2d22d2e8-ef71-4dee-bd78-519fa072538e) on gv1-client-0 [2018-11-09 13:04:34.584284] I [MSGID: 108026] [afr-self-heal-common.c:1732:afr_log_selfheal] 0-gv1-replicate-0: Completed data selfheal on 2d22d2e8-ef71-4dee-bd78-519fa072538e. sources=[0] 2 sinks=1 [2018-11-09 13:04:34.584719] W [MSGID: 114031] [client-rpc-fops.c:2865:client3_3_lookup_cbk] 0-gv1-client-0: remote operation failed. Path: <gfid:2d22d2e8-ef71-4dee-bd78-519fa072538e> (2d22d2e8-ef71-4dee-bd78-519fa072538e) [No such file or directory] [2018-11-09 13:04:34.587619] I [MSGID: 108026] [afr-self-heal-common.c:1732:afr_log_selfheal] 0-gv1-replicate-0: Completed entry selfheal on 00000000-0000-0000-0000-000000000001. sources=[1] 2 sinks=0 [2018-11-09 13:04:35.609488] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gv1-replicate-0: performing metadata selfheal on 2d22d2e8-ef71-4dee-bd78-519fa072538e [2018-11-09 13:04:35.611987] I [MSGID: 108026] [afr-self-heal-common.c:1732:afr_log_selfheal] 0-gv1-replicate-0: Completed metadata selfheal on 2d22d2e8-ef71-4dee-bd78-519fa072538e. sources=[1] 2 sinks=0 In both these cases the heal will not proceed as arbiter is the only source for heal, but it can not be a source for data heal. I have to investigate further to see why issue 2 is happening. Adding need info for the client log to verify step 7 did not give error because of write-behind.
getfattr for the above file "retry2" brick1: # getfattr -d -m . -e hex /var/lib/heketi/mounts/vg_122ea358da894515bd3c3076cc136006/brick_a3e16a8004a3cb29edafab859d182fc0/brick/retry2 getfattr: Removing leading '/' from absolute path names # file: var/lib/heketi/mounts/vg_122ea358da894515bd3c3076cc136006/brick_a3e16a8004a3cb29edafab859d182fc0/brick/retry2 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.vol_a00f012d78a5a167acc5abc66795ef95-client-1=0x000000010000000000000000 trusted.gfid=0x7c0c51d1c6c7455d8e408877c197996d trusted.gfid2path.1bba90641876e471=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f726574727932 brick2: # getfattr -d -m . -e hex /var/lib/heketi/mounts/vg_53cc243e52473fe12b459a5e46181e0c/brick_ee67805c4ba175f0a122b0374c15a130/brick/retry2 getfattr: Removing leading '/' from absolute path names # file: var/lib/heketi/mounts/vg_53cc243e52473fe12b459a5e46181e0c/brick_ee67805c4ba175f0a122b0374c15a130/brick/retry2 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.vol_a00f012d78a5a167acc5abc66795ef95-client-0=0x000000010000000000000000 trusted.gfid=0x7c0c51d1c6c7455d8e408877c197996d trusted.gfid2path.026991ed7ee7606b=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f726574727931 trusted.gfid2path.1bba90641876e471=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f726574727932 arbiter brick: # getfattr -d -m . -e hex /var/lib/heketi/mounts/vg_34af297f6dca3c38b7a297db36acc230/brick_748ba54728b88270ab0dc0d960006aae/brick/retry2 getfattr: Removing leading '/' from absolute path names # file: var/lib/heketi/mounts/vg_34af297f6dca3c38b7a297db36acc230/brick_748ba54728b88270ab0dc0d960006aae/brick/retry2 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.vol_a00f012d78a5a167acc5abc66795ef95-client-0=0x000000000000000000000000 trusted.afr.vol_a00f012d78a5a167acc5abc66795ef95-client-1=0x000000010000000000000000 trusted.gfid=0x7c0c51d1c6c7455d8e408877c197996d trusted.gfid2path.1bba90641876e471=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f726574727932
The issue is reproducible on plain arbiter volume also. Tested on the latest build # rpm -qa | grep gluster glusterfs-libs-3.12.2-27.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-27.el7rhgs.x86_64 python2-gluster-3.12.2-27.el7rhgs.x86_64 glusterfs-events-3.12.2-27.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-27.el7rhgs.x86_64
Do we know how many customer have reported the issue so far ?
Verified the fix per the above test-plan, Verified on build: # rpm -qa | grep gluster gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-libs-3.12.2-36.el7rhgs.x86_64 glusterfs-events-3.12.2-36.el7rhgs.x86_64 glusterfs-fuse-3.12.2-36.el7rhgs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0263