Description of problem: While verifying bz#1645480, hit an issue where directory is pending heal Version-Release number of selected component (if applicable): # rpm -qa | grep gluster gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-rdma-3.12.2-34.el7rhgs.x86_64 glusterfs-server-3.12.2-34.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-34.el7rhgs.x86_64 glusterfs-fuse-3.12.2-34.el7rhgs.x86_64 glusterfs-events-3.12.2-34.el7rhgs.x86_64 How reproducible: 1/1 Steps to Reproduce: 1. Create a 2X(2+1) arbiter volume, brick{1..6} 2. Start the volume and fuse mount it. 3. Create few files (200nos) inside few directories, create hardlinks and softlinks 4. Bring brick 2 and brick 5 down 5. Now do continuous metadata operations like rename, chgrp, chown, this is done continuously 6. Occasionally bring brick 1 and brick 4 down and bring the down bricks from step 4 up. 7. We see few transport endpoint not connected errors as sometime the good brick is down, which is expected 8. Now again bring brick 2 and brick 5 down, brick 1 and brick 4 up. 9. All the above steps are done when IO's per step 5 is ongoing. Actual results: Initial output # gluster v heal test info Brick 10.70.46.55:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.47.184:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.46.193:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.47.67:/bricks/brick1/testing <gfid:725ab1d7-bed9-4a25-b1a0-95e2c15605b7> Status: Connected Number of entries: 1 Brick 10.70.46.169:/bricks/brick1/testing <gfid:725ab1d7-bed9-4a25-b1a0-95e2c15605b7> Status: Connected Number of entries: 1 Brick 10.70.47.122:/bricks/brick1/testing Status: Connected Number of entries: 0 After few minutes heal output started giving dir level00 as pending heal ~]# gluster v heal test info Brick 10.70.46.55:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.47.184:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.46.193:/bricks/brick1/testing Status: Connected Number of entries: 0 Brick 10.70.47.67:/bricks/brick1/testing /level00 Status: Connected Number of entries: 1 Brick 10.70.46.169:/bricks/brick1/testing /level00 Status: Connected Number of entries: 1 Brick 10.70.47.122:/bricks/brick1/testing Status: Connected Number of entries: 0 Expected results: No files/dir should be in pending heal. Additional info: Volume Info ~]# gluster v info test Volume Name: test Type: Distributed-Replicate Volume ID: 11ee3f35-f99d-49ce-95c7-bbee829bc6f1 Status: Started Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: 10.70.46.55:/bricks/brick1/testing Brick2: 10.70.47.184:/bricks/brick1/testing Brick3: 10.70.46.193:/bricks/brick1/testing (arbiter) Brick4: 10.70.47.67:/bricks/brick1/testing Brick5: 10.70.46.169:/bricks/brick1/testing Brick6: 10.70.47.122:/bricks/brick1/testing (arbiter) Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off Also the getfattr's for the directory level00, for the second sub-vol where client3 blames client 4 and vice versa -55 ~]# getfattr -d -m . -e hex /bricks/brick1/testing/level00 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/testing/level00 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-client-1=0x000000000000000000000000 trusted.gfid=0x725ab1d7bed94a25b1a095e2c15605b7 trusted.glusterfs.dht=0x000000000000000000000000aaac18d5 trusted.glusterfs.dht.mds=0x00000000 184 ~]# getfattr -d -m . -e hex /bricks/brick1/testing/level00 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/testing/level00 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-client-0=0x000000000000000000000000 trusted.gfid=0x725ab1d7bed94a25b1a095e2c15605b7 trusted.glusterfs.dht=0x000000000000000000000000aaac18d5 trusted.glusterfs.dht.mds=0x00000000 193 ~]# getfattr -d -m . -e hex /bricks/brick1/testing/level00 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/testing/level00 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-client-0=0x000000000000000000000000 trusted.afr.test-client-1=0x000000000000000000000000 trusted.gfid=0x725ab1d7bed94a25b1a095e2c15605b7 trusted.glusterfs.dht=0x000000000000000000000000aaac18d5 trusted.glusterfs.dht.mds=0x00000000 67 ~]# getfattr -d -m . -e hex /bricks/brick1/testing/level00 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/testing/level00 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-client-4=0x000000000000000000000192 trusted.gfid=0x725ab1d7bed94a25b1a095e2c15605b7 trusted.glusterfs.dht=0x0000000000000000aaac18d6ffffffff 169 ~]# getfattr -d -m . -e hex /bricks/brick1/testing/level00 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/testing/level00 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-client-3=0x0000000000000000000001a2 trusted.gfid=0x725ab1d7bed94a25b1a095e2c15605b7 trusted.glusterfs.dht=0x0000000000000000aaac18d6ffffffff -122 ~]# getfattr -d -m . -e hex /bricks/brick1/testing/level00 getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/testing/level00 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-client-3=0x000000000000000000000000 trusted.afr.test-client-4=0x000000000000000000000000 trusted.gfid=0x725ab1d7bed94a25b1a095e2c15605b7 trusted.glusterfs.dht=0x0000000000000000aaac18d6ffffffff
*** Bug 1727257 has been marked as a duplicate of this bug. ***
upstream patch: https://review.gluster.org/#/c/glusterfs/+/23005/1
(In reply to Mohammed Rafi KC from comment #14) > upstream patch: https://review.gluster.org/#/c/glusterfs/+/23005/1 This was merged upstream in August, how come it's not part of 3.5.1?
(In reply to Yaniv Kaul from comment #18) > (In reply to Mohammed Rafi KC from comment #14) > > upstream patch: https://review.gluster.org/#/c/glusterfs/+/23005/1 > > This was merged upstream in August, how come it's not part of 3.5.1? It was a patch that comes with shd multiplex feature. This bug doesn't exist without the feature. Since the shd feature is reverted from 3.5.0, it is not part of the 3.5 branches yet.