Bug 1786553 - heal on disperse volume is not completing for 2 files after 30 mins of node reboot.
Summary: heal on disperse volume is not completing for 2 files after 30 mins of node r...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: disperse
Version: rhgs-3.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Ashish Pandey
QA Contact: Pranav Prakash
URL:
Whiteboard:
Depends On: 1640148
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-26 08:08 UTC by Bala Konda Reddy M
Modified: 2021-12-22 06:29 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-20 15:24:46 UTC
Embargoed:


Attachments (Terms of Use)

Description Bala Konda Reddy M 2019-12-26 08:08:22 UTC
Description of problem:

Performed node reboot on one of the nodes in the 4 node cluster, after node reboot all files are healed except two files. they are in same state for 30 mins. 


Version-Release number of selected component (if applicable):
glusterfs-6.0-25.el7rhgs.x86_64

How reproducible:
2/2



Steps to Reproduce:
1. On 4 nodes cluster enabled brick-mux, create two replicate volumes and mounted on the servers
2. Created disperse volume 6X(4+2) ec-vol 
3. Mounted on 5 clients and 4 servers
4. IOs(linux untar and crefi) ran on the ec-vol for 3 days.
5. Performed node reboot on one node.

Actual results:
After node reboot, except two files all files are healed 

Expected results:
All files should be healed 

Additional info:


Heal info output
[root@rhs-client25 ~]# gluster vol heal ec-vol info
Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick1/ec-vol1
Status: Connected
Number of entries: 0

Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick1/ec-vol1
Status: Connected
Number of entries: 0

Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick2/ec-vol2
Status: Connected
Number of entries: 0

Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick2/ec-vol2
Status: Connected
Number of entries: 0

Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick2/ec-vol2
Status: Connected
Number of entries: 0

Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick2/ec-vol2
Status: Connected
Number of entries: 0

Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick3/ec-vol3
Status: Connected
Number of entries: 0

Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick3/ec-vol3
Status: Connected
Number of entries: 0

Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick3/ec-vol3
Status: Connected
Number of entries: 0

Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick3/ec-vol3
Status: Connected
Number of entries: 0

Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick4/ec-vol4
Status: Connected
Number of entries: 0

Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick4/ec-vol4
Status: Connected
Number of entries: 0

Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick4/ec-vol4
Status: Connected
Number of entries: 0

Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick4/ec-vol4
Status: Connected
Number of entries: 0

Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick5/ec-vol5
Status: Connected
Number of entries: 0

Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick5/ec-vol5
Status: Connected
Number of entries: 0

Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick5/ec-vol5
Status: Connected
Number of entries: 0

Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick5/ec-vol5
Status: Connected
Number of entries: 0

Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick6/ec-vol6
Status: Connected
Number of entries: 0

Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick6/ec-vol6
Status: Connected
Number of entries: 0

Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick6/ec-vol6
Status: Connected
Number of entries: 0

Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick6/ec-vol6
Status: Connected
Number of entries: 0

Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick7/ec-vol7
Status: Connected
Number of entries: 0

Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick7/ec-vol7
Status: Connected
Number of entries: 0

Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick7/ec-vol7
Status: Connected
Number of entries: 0

Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick7/ec-vol7
/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio 
/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio/adc 
Status: Connected
Number of entries: 2

Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick8/ec-vol8
/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio 
/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio/adc 
Status: Connected
Number of entries: 2

Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick8/ec-vol8
/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio 
/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio/adc 
Status: Connected
Number of entries: 2

Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick8/ec-vol8
Status: Connected
Number of entries: 0

Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick8/ec-vol8
/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio 
/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio/adc 
Status: Connected
Number of entries: 2

Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick9/ec-vol9
Status: Connected
Number of entries: 0

Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick9/ec-vol9
Status: Connected
Number of entries: 0

Brick rhs-client25.lab.eng.blr.redhat.com:/bricks/brick9/ec-vol9
Status: Connected
Number of entries: 0

Brick rhs-client32.lab.eng.blr.redhat.com:/bricks/brick9/ec-vol9
Status: Connected
Number of entries: 0

Brick rhs-client18.lab.eng.blr.redhat.com:/bricks/brick10/ec-vol10
Status: Connected
Number of entries: 0

Brick rhs-client19.lab.eng.blr.redhat.com:/bricks/brick10/ec-vol10
Status: Connected
Number of entries: 0

-------------------------8<----------------------------

GETFATTR Output of the file in the subvol for which heal is pending 

[root@rhs-client25 ec-vol7]# getfattr -d -m . -e hex /bricks/brick7/ec-vol7/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick7/ec-vol7/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.version=0x00000000000000010000000000000005
trusted.gfid=0xb58b73fcbab7458e91b92be3b394296c
trusted.glusterfs.dht=0x0000000000000000aaaaaaa8d5555551


[root@rhs-client32 ec-vol7]# getfattr -d -m . -e hex /bricks/brick7/ec-vol7/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick7/ec-vol7/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.dirty=0x00000000000000020000000000000002
trusted.ec.version=0x00000000000000010000000000000005
trusted.gfid=0xb58b73fcbab7458e91b92be3b394296c
trusted.glusterfs.dht=0x0000000000000000aaaaaaa8d5555551

[root@rhs-client18 ec-vol8]# getfattr -d -m . -e hex IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio
# file: IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.dirty=0x00000000000000020000000000000002
trusted.ec.version=0x00000000000000010000000000000005
trusted.gfid=0xb58b73fcbab7458e91b92be3b394296c
trusted.glusterfs.dht=0x0000000000000000aaaaaaa8d5555551

[root@rhs-client19 ec-vol8]# getfattr -d -m . -e hex IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio
# file: IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.dirty=0x00000000000000020000000000000002
trusted.ec.version=0x00000000000000010000000000000005
trusted.gfid=0xb58b73fcbab7458e91b92be3b394296c
trusted.glusterfs.dht=0x0000000000000000aaaaaaa8d5555551

[root@rhs-client25 ~]# getfattr -d -m . -e hex /bricks/brick8/ec-vol8/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio/
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick8/ec-vol8/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.gfid=0xb58b73fcbab7458e91b92be3b394296c


[root@rhs-client32 ec-vol8]# getfattr -d -m . -e hex /bricks/brick8/ec-vol8/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick8/ec-vol8/IOs/kernel/dhcp43-51.lab.eng.blr.redhat.com/dir.29/linux-5.3.2/Documentation/devicetree/bindings/staging/iio
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.dirty=0x00000000000000020000000000000002
trusted.ec.version=0x00000000000000010000000000000005
trusted.gfid=0xb58b73fcbab7458e91b92be3b394296c
trusted.glusterfs.dht=0x0000000000000000aaaaaaa8d5555551

#########################################
glustershd log

gfid of the file for which heal is pending is /bricks/brick8/ec-vol8/.glusterfs/b5/8b/b58b73fc-bab7-458e-91b9-2be3b394296c and shd log with respect to the gfid

[2019-12-26 06:39:03.374721] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ec-vol-client-28: remote operation failed. Path: <gfid:b58b73fc-bab7-458e-91b9-2be3b394296c> (b58b73fc-bab7-458e-91b9-2be3b394296c) [No such file or directory]
[2019-12-26 06:39:03.393421] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ec-vol-client-28: remote operation failed. Path: <gfid:b58b73fc-bab7-458e-91b9-2be3b394296c> (b58b73fc-bab7-458e-91b9-2be3b394296c) [No such file or directory]
[2019-12-26 06:39:03.394838] E [MSGID: 114031] [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 0-ec-vol-client-14: remote operation failed [Invalid argument]
[2019-12-26 06:39:03.394852] E [MSGID: 114031] [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 0-ec-vol-client-13: remote operation failed [Invalid argument]
[2019-12-26 06:39:03.394949] E [MSGID: 114031] [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 0-ec-vol-client-15: remote operation failed [Invalid argument]
[2019-12-26 06:39:03.394992] E [MSGID: 114031] [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 0-ec-vol-client-17: remote operation failed [Invalid argument]
[2019-12-26 06:39:03.444968] E [MSGID: 114031] [client-rpc-fops_v2.c:1345:client4_0_inodelk_cbk] 0-ec-vol-client-16: remote operation failed [Invalid argument]
[2019-12-26 06:39:03.529382] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ec-vol-client-28: remote operation failed. Path: <gfid:b58b73fc-bab7-458e-91b9-2be3b394296c> (b58b73fc-bab7-458e-91b9-2be3b394296c) [No such file or directory]

Comment 7 Pranith Kumar K 2020-02-25 07:49:15 UTC
Moving to next BU as https://bugzilla.redhat.com/show_bug.cgi?id=1640148 is moved to 3.5.3.


Note You need to log in before you can comment on or make changes to this bug.