Description of problem: Automated test-case fails with files pending healing, Test-case name:test_entry_self_heal_heal_command Protocol used: nfs Volume type: 2X (2+1) Retried the automate test-run on a local cluster, observed same results, heal is pending for few files Version-Release number of selected component (if applicable): # rpm -qa | grep gluster glusterfs-client-xlators-3.12.2-31.el7rhgs.x86_64 glusterfs-debuginfo-3.12.2-31.el7rhgs.x86_64 glusterfs-cli-3.12.2-31.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64 glusterfs-libs-3.12.2-31.el7rhgs.x86_64 glusterfs-api-3.12.2-31.el7rhgs.x86_64 How reproducible: 2/2 Steps to Reproduce: 1. Create 2 X(2+1) Volume 2. NFS mount the volume 3. Disable client side healing (metadata, data and entry) 4. Write data, directories and files from mount-point. 5. Now set self-heal-deamon to off 6. Bring down one brick from each set, example b2 and b4 7. Modify data from client. (create, mv and cp) 8. Bring all the bricks up. 9. Set self-heal-deamon to on 10. Check if all bricks are up and check if all shd as running 11. Issue heal 12 Heal should be completed, with no files pending and no files in split-brain. Actual results: At Step 12, heal is pending for a few files. Expected results: Heal should be completed for all files/dirs Additional info: # gluster v info testvol_distributed-replicated Volume Name: testvol_distributed-replicated Type: Distributed-Replicate Volume ID: 2dad8909-862f-42c9-923d-8eafdfd1e50c Status: Started Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: 10.70.43.62:/bricks/brick5/testvol_distributed-replicated_brick0 Brick2: 10.70.42.103:/bricks/brick5/testvol_distributed-replicated_brick1 Brick3: 10.70.41.187:/bricks/brick5/testvol_distributed-replicated_brick2 (arbiter) Brick4: 10.70.41.216:/bricks/brick9/testvol_distributed-replicated_brick3 Brick5: 10.70.42.104:/bricks/brick9/testvol_distributed-replicated_brick4 Brick6: 10.70.43.64:/bricks/brick9/testvol_distributed-replicated_brick5 (arbiter) Options Reconfigured: cluster.self-heal-daemon: on cluster.data-self-heal: off cluster.metadata-self-heal: off cluster.entry-self-heal: off storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: off performance.client-io-threads: off cluster.server-quorum-ratio: 51 ]# gluster v heal testvol_distributed-replicated info Brick 10.70.43.62:/bricks/brick5/testvol_distributed-replicated_brick0 Status: Connected Number of entries: 0 Brick 10.70.42.103:/bricks/brick5/testvol_distributed-replicated_brick1 Status: Connected Number of entries: 0 Brick 10.70.41.187:/bricks/brick5/testvol_distributed-replicated_brick2 Status: Connected Number of entries: 0 Brick 10.70.41.216:/bricks/brick9/testvol_distributed-replicated_brick3 /files/user2_a/dir0_a/dir0_a /files/user2_a/dir0_a Status: Connected Number of entries: 2 Brick 10.70.42.104:/bricks/brick9/testvol_distributed-replicated_brick4 <gfid:d00340d5-f1a1-4e5d-8e78-a8fe7ad93e78>/user2_a/dir0_a/dir0_a <gfid:d00340d5-f1a1-4e5d-8e78-a8fe7ad93e78>/user2_a/dir0_a Status: Connected Number of entries: 2 Brick 10.70.43.64:/bricks/brick9/testvol_distributed-replicated_brick5 /files/user2_a/dir0_a/dir0_a /files/user2_a/dir0_a Status: Connected Number of entries: 2 Chage-logs for directory: dir0_a [root@dhcp43-64 ~]# getfattr -de hex -m . /bricks/brick9/testvol_distributed-replicated_brick5/files/user2_a/dir0_a/ getfattr: Removing leading '/' from absolute path names # file: bricks/brick9/testvol_distributed-replicated_brick5/files/user2_a/dir0_a/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.testvol_distributed-replicated-client-4=0x000000000000000e00000001 trusted.gfid=0xd5e24a6225434db68e46b9e261641e9c trusted.glusterfs.dht=0x00000000000000007fffffffffffffff trusted.glusterfs.dht.mds=0x00000000 [root@dhcp41-216 ~]# getfattr -de hex -m . /bricks/brick9/testvol_distributed-replicated_brick3/files/user2_a/dir0_a/ getfattr: Removing leading '/' from absolute path names # file: bricks/brick9/testvol_distributed-replicated_brick3/files/user2_a/dir0_a/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.testvol_distributed-replicated-client-4=0x000000000000000e00000001 trusted.gfid=0xd5e24a6225434db68e46b9e261641e9c trusted.glusterfs.dht=0x00000000000000007fffffffffffffff trusted.glusterfs.dht.mds=0x00000000 [root@dhcp42-104 ~]# getfattr -de hex -m . /bricks/brick9/testvol_distributed-replicated_brick4/files/user2_a/dir0_a/ getfattr: Removing leading '/' from absolute path names # file: bricks/brick9/testvol_distributed-replicated_brick4/files/user2_a/dir0_a/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0xd5e24a6225434db68e46b9e261641e9c change-logs for directory dir0_a/dir0_a [root@dhcp43-64 ~]# getfattr -de hex -m . /bricks/brick9/testvol_distributed-replicated_brick5/files/user2_a/dir0_a/dir0_a/ getfattr: Removing leading '/' from absolute path names # file: bricks/brick9/testvol_distributed-replicated_brick5/files/user2_a/dir0_a/dir0_a/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.testvol_distributed-replicated-client-4=0x000000000000000e00000001 trusted.gfid=0x197c24ca87ad49668d2773e8af8e8684 trusted.glusterfs.dht=0x0000000000000000000000007ffffffe [root@dhcp41-216 ~]# getfattr -de hex -m . /bricks/brick9/testvol_distributed-replicated_brick3/files/user2_a/dir0_a/dir0_a/ getfattr: Removing leading '/' from absolute path names # file: bricks/brick9/testvol_distributed-replicated_brick3/files/user2_a/dir0_a/dir0_a/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.testvol_distributed-replicated-client-4=0x000000000000000600000001 trusted.gfid=0x197c24ca87ad49668d2773e8af8e8684 trusted.glusterfs.dht=0x0000000000000000000000007ffffffe No file entry present for directory dir0_a/dir0_a in back-end brick 42-104 (data brick)
Hi Anees, The afr pending xattrs indicate entry self-heal is pending on '10.70.42.104:/bricks/brick9/testvol_distributed-replicated_brick4' and heal should be hindered. Can you check if this is a duplicate of BZ 1640148, where entry heal is not able to proceed because of the missing gfid symlink for the directory inside .glusterfs? If yes, we can close it as a duplicate. -Ravi
Closing as a duplicate *** This bug has been marked as a duplicate of bug 1640148 ***