Description of problem: Hit a issue while testing for replica 3 to arbiter conversion, Heal is not completing for Arbiter volume 1 X (2+1) when rename operation is done when brick down. Version-Release number of selected component (if applicable): # rpm -qa | grep gluster glusterfs-3.12.2-25.el7rhgs.x86_64 tendrl-gluster-integration-1.5.4-14.el7rhgs.noarch gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 glusterfs-server-3.12.2-25.el7rhgs.x86_64 How reproducible: Most of the time (3 out of 4 times) Steps to Reproduce: 1. Disable all client side heals 2. Created a file from client (fuse mount) when all three bricks are up. # echo "Hi" >>retry1 3. Bring down brick 2 and append the file # echo "Hi2" >>retry1 4. Bring down brick 1 and bring up brick 2 Now cat on file fails, # cat retry1 cat: retry1: Input/output error 5. Perform rename # mv retry1 retry2 # ls retry2 6. After rename, cat works on file retry2 which previously was giving I/o error # cat retry2 Hi 7. Append file retry2 # echo "W" >>retry2 Append works this time 8. Now bring all bricks up, issue heal 9. # cat retry2 cat: retry2: Input/output error 10. Also heal info has this file retry2 as pending heal A similar bug was raised upstream BZ Bug 1357000 Actual results: Tried this scenario four time, three times it hit the issue. # cat retry1 cat: retry1: No such file or directory # cat file2 cat: file2: Input/output error # cat nile2 cat: nile2: Input/output error Heal is pending for the renamed file, # gluster v heal newvol info Brick 10.70.47.130:/bricks/brick1/newvol Status: Connected Number of entries: 0 Brick 10.70.46.213:/bricks/brick1/newvol <gfid:b9291fe7-06ae-4fea-b492-90882e91c299>/file2 <gfid:b9291fe7-06ae-4fea-b492-90882e91c299>/nile2 <gfid:b9291fe7-06ae-4fea-b492-90882e91c299>/retry2 Status: Connected Number of entries: 3 Brick 10.70.47.38:/bricks/brick1/newvol1 /dir2/file2 /dir2/nile2 /dir2/retry2 Status: Connected Number of entries: 3 At step 9 unable to read file, hence seems to be a Data Unavailable issue, Expected results: At step 7 append should not be allowed as the brick containing good copy is down, also heal should complete, with no files pending heal Additional info: xttr of the entries Brick1 # getfattr -d -m . -e hex /bricks/brick1/newvol/dir2/retry2 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/newvol/dir2/retry2 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0x5e9646bf976c4cd8832c1da9b10c730f trusted.gfid2path.3cf2e0799590a8e0=0x62393239316665372d303661652d346665612d623439322d3930383832653931633239392f726574727932 Brick2 # getfattr -d -m . -e hex /bricks/brick1/newvol/dir2/retry2 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/newvol/dir2/retry2 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.newvol-client-0=0x000000010000000000000000 trusted.gfid=0x5e9646bf976c4cd8832c1da9b10c730f trusted.gfid2path.3cf2e0799590a8e0=0x62393239316665372d303661652d346665612d623439322d3930383832653931633239392f726574727932 Brick3 / arbiter # getfattr -d -m . -e hex /bricks/brick1/newvol1/dir2/retry2 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/newvol1/dir2/retry2 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.newvol-client-0=0x000000010000000000000000 trusted.afr.newvol-client-1=0x000000010000000000000000 trusted.gfid=0x5e9646bf976c4cd8832c1da9b10c730f trusted.gfid2path.3cf2e0799590a8e0=0x62393239316665372d303661652d346665612d623439322d3930383832653931633239392f726574727932 Client logs: [2018-11-02 09:56:14.772374] W [fuse-bridge.c:871:fuse_attr_cbk] 0-glusterfs-fuse: 2218: STAT() /dir2 => -1 (Input/output error) [2018-11-02 09:56:15.961426] I [rpc-clnt.c:2007:rpc_clnt_reconfig] 4-newvol-client-1: changing port to 49162 (from 0) [2018-11-02 09:56:15.967799] I [MSGID: 114057] [client-handshake.c:1397:select_server_supported_programs] 4-newvol-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2018-11-02 09:56:15.972517] I [MSGID: 114046] [client-handshake.c:1150:client_setvolume_cbk] 4-newvol-client-1: Connected to newvol-client-1, attached to remote volume '/bricks/brick1/newvol'. [2018-11-02 09:56:15.972553] I [MSGID: 114047] [client-handshake.c:1161:client_setvolume_cbk] 4-newvol-client-1: Server and Client lk-version numbers are not same, reopening the fds [2018-11-02 09:56:15.972704] I [MSGID: 108002] [afr-common.c:5164:afr_notify] 4-newvol-replicate-0: Client-quorum is met [2018-11-02 09:56:15.973448] I [MSGID: 114035] [client-handshake.c:121:client_set_lk_version_cbk] 4-newvol-client-1: Server lk version = 1 [2018-11-02 09:56:42.620808] W [fuse-bridge.c:1396:fuse_err_cbk] 0-glusterfs-fuse: 2285: FLUSH() ERR => -1 (Transport endpoint is not connected) [2018-11-02 09:57:13.664335] I [rpc-clnt.c:2007:rpc_clnt_reconfig] 4-newvol-client-0: changing port to 49158 (from 0) [2018-11-02 09:57:13.669657] I [MSGID: 114057] [client-handshake.c:1397:select_server_supported_programs] 4-newvol-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2018-11-02 09:57:13.670849] I [MSGID: 114046] [client-handshake.c:1150:client_setvolume_cbk] 4-newvol-client-0: Connected to newvol-client-0, attached to remote volume '/bricks/brick1/newvol'. [2018-11-02 09:57:13.670888] I [MSGID: 114047] [client-handshake.c:1161:client_setvolume_cbk] 4-newvol-client-0: Server and Client lk-version numbers are not same, reopening the fds [2018-11-02 09:57:13.671372] I [MSGID: 114035] [client-handshake.c:121:client_set_lk_version_cbk] 4-newvol-client-0: Server lk version = 1 [2018-11-02 09:57:32.177276] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 4-newvol-replicate-0: Failing STAT on gfid 5e9646bf-976c-4cd8-832c-1da9b10c730f: split-brain observed. [Input/output e rror] [2018-11-02 09:57:32.177385] W [fuse-bridge.c:871:fuse_attr_cbk] 0-glusterfs-fuse: 2294: STAT() /dir2/retry2 => -1 (Input/output error) [2018-11-02 10:16:43.798183] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 4-newvol-replicate-0: Failing READ on gfid 5e9646bf-976c-4cd8-832c-1da9b10c730f: split-brain observed. [Input/output e rror] [2018-11-02 10:16:43.798368] W [fuse-bridge.c:2337:fuse_readv_cbk] 0-glusterfs-fuse: 2404: READ => -1 gfid=5e9646bf-976c-4cd8-832c-1da9b10c730f fd=0x7f552000f3b0 (Input/output error) [2018-11-02 10:17:07.922620] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 4-newvol-replicate-0: Failing READ on gfid 53ba229a-a933-48a9-a7c9-2abd99c1e557: split-brain observed. [Input/output e rror] [2018-11-02 10:17:07.922731] W [fuse-bridge.c:2337:fuse_readv_cbk] 0-glusterfs-fuse: 2417: READ => -1 gfid=53ba229a-a933-48a9-a7c9-2abd99c1e557 fd=0x7f552002ac00 (Input/output error) [2018-11-02 10:19:49.578790] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 4-newvol-replicate-0: Failing READ on gfid f20e1de8-c29f-4bd9-9d2b-c9c56af6ce33: split-brain observed. [Input/output error] [2018-11-02 10:19:49.578953] W [fuse-bridge.c:2337:fuse_readv_cbk] 0-glusterfs-fuse: 2433: READ => -1 gfid=f20e1de8-c29f-4bd9-9d2b-c9c56af6ce33 fd=0x7f552001ad20 (Input/output error) Volume Info: # gluster v info newvol Volume Name: newvol Type: Replicate Volume ID: 05cd2dec-7a61-4eb0-88cd-b5b829bed17f Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 10.70.47.130:/bricks/brick1/newvol Brick2: 10.70.46.213:/bricks/brick1/newvol Brick3: 10.70.47.38:/bricks/brick1/newvol1 (arbiter) Options Reconfigured: cluster.entry-self-heal: off cluster.data-self-heal: off cluster.metadata-self-heal: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off
This bug got created twice (1645480), due to browser refresh, Hence closing it as duplicate of BZ#1645480 *** This bug has been marked as a duplicate of bug 1645480 ***