Bug 1460629
Summary: | [Stress] : Multi-threaded Healing stops to progress after some time during replace brick on an arbitrated volume with lots of I/O from FUSE mounts. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ambarish <asoman> |
Component: | arbiter | Assignee: | Ravishankar N <ravishankar> |
Status: | CLOSED WONTFIX | QA Contact: | Karan Sandha <ksandha> |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | rhgs-3.3 | CC: | amukherj, bturner, pmulay, ravishankar, rhinduja, rhs-bugs, storage-qa-internal |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Known Issue | |
Doc Text: |
If there is a pending entry self-heal involving purging files from a sink brick, and an ‘rm -rf’ command is executed on the parent directory before the self-heal, the directory and files awaiting heal may not be removed from the sink brick. This is because the readdir for the ‘rm -rf’ will be served from the source brick, and the file pending entry heal is not deleted from the sink brick. Also, any pending data or metadata heal on such a file are displayed in heal-info output until the issue is fixed.
Workaround: If the files and parent directory are not present on other bricks, delete them from the sink brick. This ensures that they are no longer listed in the next heal-info output.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-11-19 06:05:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1417153 |
Description
Ambarish
2017-06-12 09:17:17 UTC
[root@gqas005 86]# getfattr -d -m . -e hex 7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1 # file: 7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1 trusted.afr.dirty=0x000000010000000000000000 trusted.gfid=0x7486c2e2202144cea3a2a4dffa5e9ac1 [root@gqas005 86]# [root@gqas005 84]# getfattr -d -m . -e hex 8384ce29-1ea6-4564-8d70-4efd4f3a4dfb # file: 8384ce29-1ea6-4564-8d70-4efd4f3a4dfb trusted.afr.dirty=0x000000010000000000000000 trusted.gfid=0x8384ce291ea645648d704efd4f3a4dfb [root@gqas005 84]# (In reply to Ambarish from comment #2) > [root@gqas005 86]# getfattr -d -m . -e hex > 7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1 > # file: 7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1 > trusted.afr.dirty=0x000000010000000000000000 > trusted.gfid=0x7486c2e2202144cea3a2a4dffa5e9ac1 > > [root@gqas005 86]# > > > > [root@gqas005 84]# getfattr -d -m . -e hex > 8384ce29-1ea6-4564-8d70-4efd4f3a4dfb > # file: 8384ce29-1ea6-4564-8d70-4efd4f3a4dfb > trusted.afr.dirty=0x000000010000000000000000 > trusted.gfid=0x8384ce291ea645648d704efd4f3a4dfb > > [root@gqas005 84]# Data heal appears to be pending. Examined the setup and noted the following: 1. The 2nd brick in replica pair consisting of brick-8 is the one which has the pending heals for 2 gfids (as seen from heal info output). I have taken one file (gfid) as example for the steps below. 2. The file *and* its parent dir are present only in the 2nd brick: [root@gqas005 A1]# find . -samefile .glusterfs/74/86/7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1 ./.glusterfs/74/86/7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1 ./run3223/clients/client6/~dmtmp/PWRPNT/NEWTIPS.PPT They are not present in the 1st and arbiter bricks. Their getfattrs are: root@gqas005 A1]# g ./run3223/clients/client6/~dmtmp/PWRPNT/NEWTIPS.PPT # file: run3223/clients/client6/~dmtmp/ /NEWTIPS.PPT trusted.afr.dirty=0x000000010000000000000000 trusted.gfid=0x7486c2e2202144cea3a2a4dffa5e9ac1 [root@gqas005 A1]# g ./run3223/clients/client6/~dmtmp/PWRPNT # file: run3223/clients/client6/~dmtmp/PWRPNT trusted.gfid=0x0c2c29a6eea64e07b7e1989580d7681f trusted.glusterfs.dht=0x00000001000000004ccccccb66666663 3) It is being show in heal-info output because the gfid is present in .glusterfs/indices/dirty (as opposed to .glusterfs/indices/xattrop) This, along with the dirty bit being set above means the preo-op happened and then the client lost connection to the brick during a write transaction. I see lot of disconnect messages in the 4 fuse mount logs to this brick, ie. butcher-client-22 (and also to many other bricks) and corresponding messages in the brick log: bricks/bricks8-A1.log:994:[2017-06-11 11:45:05.920177] W [inodelk.c:399:pl_inodelk_log_cleanup] 0-butcher-server: releasing lock on 7486c2e2-2021-44ce-a3a2-a4dffa5e9ac1 held by {client=0x7f8b64ff7c90, pid=3257 lk-owner=a05e46b8827f0000} bricks/bricks8-A1.log:1008:[2017-06-11 11:45:05.920602] I [MSGID: 115013] [server-helpers.c:296:do_fd_cleanup] 0-butcher-server: fd cleanup on /run3223/clients/client6/~dmtmp/PWRPNT/NEWTIPS.PPT 4)The file (NEWTIPS.PPT) is deleted by the fuse client, which also succeeds only on brick1 and 3. 5)Now rm -rf comes on parent directory, the readdir for which is served from b1 or b3 where NEWTIPS.PPT is not present. Consequently when rmdir comes on PWRPNT, it succeeds on 1 and 3 but fails on 2: ricks/bricks8-A1.log:1080:[2017-06-11 11:45:16.723007] I [MSGID: 115056] [server-rpc-fops.c:460:server_rmdir_cbk] 0-butcher-server: 474130: RMDIR /run3223/clients/client6/~dmtmp/PWRPNT (41a0f073-e019-4f8e-83cb-29de67a5d1bc/PWRPNT) ==> (Directory not empty) [Directory not empty] In both 4) and 5), application gets success because quorum is met. Ambarish confirmed that the file/dir in question is a temp file that is deleted by fssanity and hence unlink/rm -rf are expected operations. Thus self-heal is not able to heal the files because neither the file nor the parent is present in the 2 data bricks. This looks like a day one issue to me, i.e. if there are pending entry heals involving files to be deleted from the sink brick and an rmdir is performed on the parent dir, the dir and the file won't get removed from the sink. Hi Team, I've edited the Doc Text in order to be a part of the Release Notes. Request you to review the same and let me know in case of any concerns. If no changes are required, request you to provide your approval for the same. Looks good to me Pratik. |