Description of problem: ----------------------- 10*(2+1) volume,mounted via FUSE on multiple clients. Each client had a different wrokload generator - Bonnie,dbench,kernel untar,iozone etc. Replaced ten data bricks on one of the nodes to trigger an MTSH. Heal seems to be stuck somewhere.It shows 2 entries pending heal since yesterday : <snip> [root@gqas013 glusterfs]# gluster v heal butcher info Brick gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brick1 Status: Connected Number of entries: 0 Brick gqas005.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Status: Connected Number of entries: 0 Brick gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Status: Connected Number of entries: 0 Brick gqas013.sbu.lab.eng.bos.redhat.com:/bricks2/brick1 Status: Connected Number of entries: 0 Brick gqas005.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Status: Connected Number of entries: 0 Brick gqas006.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Status: Connected Number of entries: 0 Brick gqas013.sbu.lab.eng.bos.redhat.com:/bricks3/brick1 Status: Connected Number of entries: 0 Brick gqas005.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Status: Connected Number of entries: 0 Brick gqas006.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Status: Connected Number of entries: 0 Brick gqas013.sbu.lab.eng.bos.redhat.com:/bricks4/brick1 Status: Connected Number of entries: 0 Brick gqas005.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Status: Connected Number of entries: 0 Brick gqas006.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Status: Connected Number of entries: 0 Brick gqas013.sbu.lab.eng.bos.redhat.com:/bricks5/brick1 Status: Connected Number of entries: 0 Brick gqas005.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Status: Connected Number of entries: 0 Brick gqas006.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Status: Connected Number of entries: 0 Brick gqas013.sbu.lab.eng.bos.redhat.com:/bricks6/brick1 Status: Connected Number of entries: 0 Brick gqas005.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Status: Connected Number of entries: 0 Brick gqas006.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Status: Connected Number of entries: 0 Brick gqas013.sbu.lab.eng.bos.redhat.com:/bricks7/brick1 Status: Connected Number of entries: 0 Brick gqas005.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Status: Connected Number of entries: 0 Brick gqas006.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Status: Connected Number of entries: 0 Brick gqas013.sbu.lab.eng.bos.redhat.com:/bricks8/brick1 Status: Connected Number of entries: 0 Brick gqas005.sbu.lab.eng.bos.redhat.com:/bricks8/A1 <gfid:7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1> <gfid:8384ce29-1ea6-4564-8d70-4efd4f3a4dfb> Status: Connected Number of entries: 2 Brick gqas006.sbu.lab.eng.bos.redhat.com:/bricks8/A1 Status: Connected Number of entries: 0 Brick gqas013.sbu.lab.eng.bos.redhat.com:/bricks9/brick1 Status: Connected Number of entries: 0 Brick gqas005.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Status: Connected Number of entries: 0 Brick gqas006.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Status: Connected Number of entries: 0 Brick gqas013.sbu.lab.eng.bos.redhat.com:/bricks10/brick1 Status: Connected Number of entries: 0 Brick gqas005.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Status: Connected Number of entries: 0 Brick gqas006.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Status: Connected Number of entries: 0 [root@gqas013 glusterfs]# </snip> Version-Release number of selected component (if applicable): ------------------------------------------------------------- 3.8.4-27 How reproducible: ----------------- 1/1 Additional info: ---------------- [root@gqas013 glusterfs]# gluster v info Volume Name: butcher Type: Distributed-Replicate Volume ID: 602030c8-6926-4024-a188-2bf83fadcc1e Status: Started Snapshot Count: 0 Number of Bricks: 10 x (2 + 1) = 30 Transport-type: tcp Bricks: Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brick1 Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/A1 (arbiter) Brick4: gqas013.sbu.lab.eng.bos.redhat.com:/bricks2/brick1 Brick5: gqas005.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick6: gqas006.sbu.lab.eng.bos.redhat.com:/bricks2/A1 (arbiter) Brick7: gqas013.sbu.lab.eng.bos.redhat.com:/bricks3/brick1 Brick8: gqas005.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick9: gqas006.sbu.lab.eng.bos.redhat.com:/bricks3/A1 (arbiter) Brick10: gqas013.sbu.lab.eng.bos.redhat.com:/bricks4/brick1 Brick11: gqas005.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick12: gqas006.sbu.lab.eng.bos.redhat.com:/bricks4/A1 (arbiter) Brick13: gqas013.sbu.lab.eng.bos.redhat.com:/bricks5/brick1 Brick14: gqas005.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick15: gqas006.sbu.lab.eng.bos.redhat.com:/bricks5/A1 (arbiter) Brick16: gqas013.sbu.lab.eng.bos.redhat.com:/bricks6/brick1 Brick17: gqas005.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick18: gqas006.sbu.lab.eng.bos.redhat.com:/bricks6/A1 (arbiter) Brick19: gqas013.sbu.lab.eng.bos.redhat.com:/bricks7/brick1 Brick20: gqas005.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Brick21: gqas006.sbu.lab.eng.bos.redhat.com:/bricks7/A1 (arbiter) Brick22: gqas013.sbu.lab.eng.bos.redhat.com:/bricks8/brick1 Brick23: gqas005.sbu.lab.eng.bos.redhat.com:/bricks8/A1 Brick24: gqas006.sbu.lab.eng.bos.redhat.com:/bricks8/A1 (arbiter) Brick25: gqas013.sbu.lab.eng.bos.redhat.com:/bricks9/brick1 Brick26: gqas005.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick27: gqas006.sbu.lab.eng.bos.redhat.com:/bricks9/A1 (arbiter) Brick28: gqas013.sbu.lab.eng.bos.redhat.com:/bricks10/brick1 Brick29: gqas005.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Brick30: gqas006.sbu.lab.eng.bos.redhat.com:/bricks10/A1 (arbiter) Options Reconfigured: nfs.disable: on transport.address-family: inet features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.stat-prefetch: on performance.cache-invalidation: on performance.md-cache-timeout: 600 network.inode-lru-limit: 50000 server.event-threads: 4 client.event-threads: 4 cluster.shd-max-threads: 64 cluster.shd-wait-qlength: 655536
[root@gqas005 86]# getfattr -d -m . -e hex 7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1 # file: 7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1 trusted.afr.dirty=0x000000010000000000000000 trusted.gfid=0x7486c2e2202144cea3a2a4dffa5e9ac1 [root@gqas005 86]# [root@gqas005 84]# getfattr -d -m . -e hex 8384ce29-1ea6-4564-8d70-4efd4f3a4dfb # file: 8384ce29-1ea6-4564-8d70-4efd4f3a4dfb trusted.afr.dirty=0x000000010000000000000000 trusted.gfid=0x8384ce291ea645648d704efd4f3a4dfb [root@gqas005 84]#
(In reply to Ambarish from comment #2) > [root@gqas005 86]# getfattr -d -m . -e hex > 7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1 > # file: 7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1 > trusted.afr.dirty=0x000000010000000000000000 > trusted.gfid=0x7486c2e2202144cea3a2a4dffa5e9ac1 > > [root@gqas005 86]# > > > > [root@gqas005 84]# getfattr -d -m . -e hex > 8384ce29-1ea6-4564-8d70-4efd4f3a4dfb > # file: 8384ce29-1ea6-4564-8d70-4efd4f3a4dfb > trusted.afr.dirty=0x000000010000000000000000 > trusted.gfid=0x8384ce291ea645648d704efd4f3a4dfb > > [root@gqas005 84]# Data heal appears to be pending.
Examined the setup and noted the following: 1. The 2nd brick in replica pair consisting of brick-8 is the one which has the pending heals for 2 gfids (as seen from heal info output). I have taken one file (gfid) as example for the steps below. 2. The file *and* its parent dir are present only in the 2nd brick: [root@gqas005 A1]# find . -samefile .glusterfs/74/86/7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1 ./.glusterfs/74/86/7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1 ./run3223/clients/client6/~dmtmp/PWRPNT/NEWTIPS.PPT They are not present in the 1st and arbiter bricks. Their getfattrs are: root@gqas005 A1]# g ./run3223/clients/client6/~dmtmp/PWRPNT/NEWTIPS.PPT # file: run3223/clients/client6/~dmtmp/ /NEWTIPS.PPT trusted.afr.dirty=0x000000010000000000000000 trusted.gfid=0x7486c2e2202144cea3a2a4dffa5e9ac1 [root@gqas005 A1]# g ./run3223/clients/client6/~dmtmp/PWRPNT # file: run3223/clients/client6/~dmtmp/PWRPNT trusted.gfid=0x0c2c29a6eea64e07b7e1989580d7681f trusted.glusterfs.dht=0x00000001000000004ccccccb66666663 3) It is being show in heal-info output because the gfid is present in .glusterfs/indices/dirty (as opposed to .glusterfs/indices/xattrop) This, along with the dirty bit being set above means the preo-op happened and then the client lost connection to the brick during a write transaction. I see lot of disconnect messages in the 4 fuse mount logs to this brick, ie. butcher-client-22 (and also to many other bricks) and corresponding messages in the brick log: bricks/bricks8-A1.log:994:[2017-06-11 11:45:05.920177] W [inodelk.c:399:pl_inodelk_log_cleanup] 0-butcher-server: releasing lock on 7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1 held by {client=0x7f8b64ff7c90, pid=3257 lk-owner=a05e46b8827f0000} bricks/bricks8-A1.log:1008:[2017-06-11 11:45:05.920602] I [MSGID: 115013] [server-helpers.c:296:do_fd_cleanup] 0-butcher-server: fd cleanup on /run3223/clients/client6/~dmtmp/PWRPNT/NEWTIPS.PPT 4)The file (NEWTIPS.PPT) is deleted by the fuse client, which also succeeds only on brick1 and 3. 5)Now rm -rf comes on parent directory, the readdir for which is served from b1 or b3 where NEWTIPS.PPT is not present. Consequently when rmdir comes on PWRPNT, it succeeds on 1 and 3 but fails on 2: ricks/bricks8-A1.log:1080:[2017-06-11 11:45:16.723007] I [MSGID: 115056] [server-rpc-fops.c:460:server_rmdir_cbk] 0-butcher-server: 474130: RMDIR /run3223/clients/client6/~dmtmp/PWRPNT (41a0f073-e019-4f8e-83cb-29de67a5d1bc/PWRPNT) ==> (Directory not empty) [Directory not empty] In both 4) and 5), application gets success because quorum is met. Ambarish confirmed that the file/dir in question is a temp file that is deleted by fssanity and hence unlink/rm -rf are expected operations. Thus self-heal is not able to heal the files because neither the file nor the parent is present in the 2 data bricks. This looks like a day one issue to me, i.e. if there are pending entry heals involving files to be deleted from the sink brick and an rmdir is performed on the parent dir, the dir and the file won't get removed from the sink.
Hi Team, I've edited the Doc Text in order to be a part of the Release Notes. Request you to review the same and let me know in case of any concerns. If no changes are required, request you to provide your approval for the same.
Looks good to me Pratik.