Description of problem: Performed node reboot on one of the nodes in the 4 node cluster, after node reboot all files are healed except two files. they are in same state for 30 mins. Version-Release number of selected component (if applicable): glusterfs-6.0-25.el7rhgs.x86_64 How reproducible: 2/2 Steps to Reproduce: 1. On 4 nodes cluster enabled brick-mux, create two replicate volumes and mounted on the servers 2. Created disperse volume 6X(4+2) ec-vol 3. Mounted on 5 clients and 4 servers 4. IOs(linux untar and crefi) ran on the ec-vol for 3 days. 5. Performed node reboot on one node. Actual results: After node reboot, except two files all files are healed Expected results: All files should be healed Additional info: Heal info output [root@rhs-client25 ~]# gluster vol heal ec-vol info Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick1/ec-vol1 Status: Connected Number of entries: 0 Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick1/ec-vol1 Status: Connected Number of entries: 0 Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick2/ec-vol2 Status: Connected Number of entries: 0 Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick2/ec-vol2 Status: Connected Number of entries: 0 Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick2/ec-vol2 Status: Connected Number of entries: 0 Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick2/ec-vol2 Status: Connected Number of entries: 0 Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick3/ec-vol3 Status: Connected Number of entries: 0 Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick3/ec-vol3 Status: Connected Number of entries: 0 Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick3/ec-vol3 Status: Connected Number of entries: 0 Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick3/ec-vol3 Status: Connected Number of entries: 0 Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick4/ec-vol4 Status: Connected Number of entries: 0 Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick4/ec-vol4 Status: Connected Number of entries: 0 Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick4/ec-vol4 Status: Connected Number of entries: 0 Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick4/ec-vol4 Status: Connected Number of entries: 0 Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick5/ec-vol5 Status: Connected Number of entries: 0 Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick5/ec-vol5 Status: Connected Number of entries: 0 Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick5/ec-vol5 Status: Connected Number of entries: 0 Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick5/ec-vol5 Status: Connected Number of entries: 0 Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick6/ec-vol6 Status: Connected Number of entries: 0 Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick6/ec-vol6 Status: Connected Number of entries: 0 Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick6/ec-vol6 Status: Connected Number of entries: 0 Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick6/ec-vol6 Status: Connected Number of entries: 0 Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick7/ec-vol7 Status: Connected Number of entries: 0 Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick7/ec-vol7 Status: Connected Number of entries: 0 Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick7/ec-vol7 Status: Connected Number of entries: 0 Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick7/ec-vol7 /IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio /IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio/adc Status: Connected Number of entries: 2 Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick8/ec-vol8 /IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio /IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio/adc Status: Connected Number of entries: 2 Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick8/ec-vol8 /IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio /IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio/adc Status: Connected Number of entries: 2 Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick8/ec-vol8 Status: Connected Number of entries: 0 Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick8/ec-vol8 /IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio /IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio/adc Status: Connected Number of entries: 2 Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick9/ec-vol9 Status: Connected Number of entries: 0 Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick9/ec-vol9 Status: Connected Number of entries: 0 Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick9/ec-vol9 Status: Connected Number of entries: 0 Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick9/ec-vol9 Status: Connected Number of entries: 0 Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick10/ec-vol10 Status: Connected Number of entries: 0 Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick10/ec-vol10 Status: Connected Number of entries: 0 -------------------------8<---------------------------- GETFATTR Output of the file in the subvol for which heal is pending [root@rhs-client25 ec-vol7]# getfattr -d -m . -e hex /bricks/brick7/ec-vol7/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio getfattr: Removing leading '/' from absolute path names # file: bricks/brick7/ec-vol7/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.ec.version=0x00000000000000010000000000000005 trusted.gfid=0xb58b73fcbab7458e91b92be3b394296c trusted.glusterfs.dht=0x0000000000000000aaaaaaa8d5555551 [root@rhs-client32 ec-vol7]# getfattr -d -m . -e hex /bricks/brick7/ec-vol7/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio getfattr: Removing leading '/' from absolute path names # file: bricks/brick7/ec-vol7/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.ec.dirty=0x00000000000000020000000000000002 trusted.ec.version=0x00000000000000010000000000000005 trusted.gfid=0xb58b73fcbab7458e91b92be3b394296c trusted.glusterfs.dht=0x0000000000000000aaaaaaa8d5555551 [root@rhs-client18 ec-vol8]# getfattr -d -m . -e hex IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio # file: IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.ec.dirty=0x00000000000000020000000000000002 trusted.ec.version=0x00000000000000010000000000000005 trusted.gfid=0xb58b73fcbab7458e91b92be3b394296c trusted.glusterfs.dht=0x0000000000000000aaaaaaa8d5555551 [root@rhs-client19 ec-vol8]# getfattr -d -m . -e hex IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio # file: IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.ec.dirty=0x00000000000000020000000000000002 trusted.ec.version=0x00000000000000010000000000000005 trusted.gfid=0xb58b73fcbab7458e91b92be3b394296c trusted.glusterfs.dht=0x0000000000000000aaaaaaa8d5555551 [root@rhs-client25 ~]# getfattr -d -m . -e hex /bricks/brick8/ec-vol8/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio/ getfattr: Removing leading '/' from absolute path names # file: bricks/brick8/ec-vol8/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0xb58b73fcbab7458e91b92be3b394296c [root@rhs-client32 ec-vol8]# getfattr -d -m . -e hex /bricks/brick8/ec-vol8/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio getfattr: Removing leading '/' from absolute path names # file: bricks/brick8/ec-vol8/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.ec.dirty=0x00000000000000020000000000000002 trusted.ec.version=0x00000000000000010000000000000005 trusted.gfid=0xb58b73fcbab7458e91b92be3b394296c trusted.glusterfs.dht=0x0000000000000000aaaaaaa8d5555551 ######################################### glustershd log gfid of the file for which heal is pending is /bricks/brick8/ec-vol8/.glusterfs/b5/8b/b58b73fc-bab7-458e-91b9-2be3b394296c and shd log with respect to the gfid [2019-12-26 06:39:03.374721] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ec-vol-client-28: remote operation failed. Path: <gfid:b58b73fc-bab7-458e-91b9-2be3b394296c> (b58b73fc-bab7-458e-91b9-2be3b394296c) [No such file or directory] [2019-12-26 06:39:03.393421] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ec-vol-client-28: remote operation failed. Path: <gfid:b58b73fc-bab7-458e-91b9-2be3b394296c> (b58b73fc-bab7-458e-91b9-2be3b394296c) [No such file or directory] [2019-12-26 06:39:03.394838] E [MSGID: 114031] [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 0-ec-vol-client-14: remote operation failed [Invalid argument] [2019-12-26 06:39:03.394852] E [MSGID: 114031] [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 0-ec-vol-client-13: remote operation failed [Invalid argument] [2019-12-26 06:39:03.394949] E [MSGID: 114031] [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 0-ec-vol-client-15: remote operation failed [Invalid argument] [2019-12-26 06:39:03.394992] E [MSGID: 114031] [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 0-ec-vol-client-17: remote operation failed [Invalid argument] [2019-12-26 06:39:03.444968] E [MSGID: 114031] [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 0-ec-vol-client-16: remote operation failed [Invalid argument] [2019-12-26 06:39:03.529382] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ec-vol-client-28: remote operation failed. Path: <gfid:b58b73fc-bab7-458e-91b9-2be3b394296c> (b58b73fc-bab7-458e-91b9-2be3b394296c) [No such file or directory]
Moving to next BU as https://bugzilla.redhat.com/show_bug.cgi?id=1640148 is moved to 3.5.3.