Bug 1480314
Summary: | [Distribute] : File migration failures with ESTALE and ENONENTs during a remove-brick operation | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ambarish <asoman> |
Component: | distribute | Assignee: | Susant Kumar Palai <spalai> |
Status: | CLOSED WORKSFORME | QA Contact: | Sayalee <saraut> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | rhgs-3.3 | CC: | amukherj, nbalacha, rhinduja, rhs-bugs, saraut, sraut, storage-qa-internal, tdesala |
Target Milestone: | --- | Keywords: | Reopened, Triaged, ZStream |
Target Release: | --- | Flags: | tdesala:
needinfo-
|
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-11-05 05:21:23 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ambarish
2017-08-10 16:30:23 UTC
One clarification here - rebalance itself has not failed. Some files failed to be migrated. (In reply to Nithya Balachandran from comment #2) > One clarification here - rebalance itself has not failed. Some files failed > to be migrated. Remove brick status does show non zero failures on each of the nodes actually : [root@gqas013 ~]# gluster v remove-brick butcher gqas008.sbu.lab.eng.bos.redhat.com:/bricks6/brick gqas006.sbu.lab.eng.bos.redhat.com:/bricks6/brick gqas005.sbu.lab.eng.bos.redhat.com:/bricks6/brick gqas013.sbu.lab.eng.bos.redhat.com:/bricks6/brick status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 552180 31.5GB 2558412 1 0 completed 9:13:16 gqas005.sbu.lab.eng.bos.redhat.com 548099 239.5GB 2554745 4 0 completed 9:13:15 gqas006.sbu.lab.eng.bos.redhat.com 506016 238.6GB 2571464 1 0 completed 9:13:15 gqas008.sbu.lab.eng.bos.redhat.com 508081 394.7GB 2570975 1 0 completed 9:13:15 [root@gqas013 ~]# That is what I mean by file migration failures. If the rebalance itself had failed it would no longer be running and the status would show "failed" instead of "in progress". Update: Don't have the RCA yet. But putting here some observations. Putting my observations for file /d1/dir16/linux-4.13-rc4/drivers/staging/rtl8188eu/hal/odm.c. (Found similar observations for /d1/dir12/linux-4.13-rc4/fs/overlayfs/overlayfs.h, will confirm if all the failures are related) Here are the activities recorded for file odm.c (as mentioned above). [node 8 represents: gqas008 and so on] t0 - file was present on rep-1 t1- node 8 tried to migrate from rep-1 tp rep-0 12:21:09.556685 t2- node 5 tried to migrate from rep-1 rep-8 12:21:09.631189 t3- node 8 deleted the linkto file created by migration from node 5 12:21:09.639110(attempt) 12:21:09.640050 (deletion confirmation) t4- node 8 migrated the file from rep-1 to rep-0 12:21:09.640131 t5- node 5 failed migration as the destination file on rep-8 was deleted by node 8 12:21:09.640901 (lookup failure) So from the logs it looks like the file was initially on rep-1. rep-1 has two bricks: 1) 006/b1 2) 008/b1 Now interestingly migration was triggered from nodes which are 005 and 008. Given both the migration logs point to rep-1 as the source, and 005 is not part of rep-1, I am wondering what caused 005 to take up migration for the file. And this is where the bug lies. Now migration on 008 deletes the linkto file created by migration on 005 leading to a migration failure on 005, but successfully migrates the file to rep-0. file location post migration: [root@gqas005 ~]# ssh root@gqas005 ls -l /bricks*/brick/d1/dir16/linux-4.13-rc4/drivers/staging/rtl8188eu/hal/odm.c root@gqas005's password: -rw-rw-r-- 2 root root 44436 Aug 6 21:44 /bricks1/brick/d1/dir16/linux-4.13-rc4/drivers/staging/rtl8188eu/hal/odm.c [root@gqas005 ~]# ssh root@gqas006 ls -l /bricks*/brick/d1/dir16/linux-4.13-rc4/drivers/staging/rtl8188eu/hal/odm.c root@gqas006's password: ls: cannot access /bricks1/brick/d1/dir16/linux-4.13-rc4/drivers/staging/rtl8188eu/hal/odm.c: No such file or directory [root@gqas005 ~]# ssh root@gqas008 ls -l /bricks*/brick/d1/dir16/linux-4.13-rc4/drivers/staging/rtl8188eu/hal/odm.c root@gqas008's password: ls: cannot access /bricks1/brick/d1/dir16/linux-4.13-rc4/drivers/staging/rtl8188eu/hal/odm.c: No such file or directory [root@gqas005 ~]# ssh root@gqas013 ls -l /bricks*/brick/d1/dir16/linux-4.13-rc4/drivers/staging/rtl8188eu/hal/odm.c root@gqas013's password: -rw-rw-r-- 2 root root 44436 Aug 6 21:44 /bricks1/brick/d1/dir16/linux-4.13-rc4/drivers/staging/rtl8188eu/hal/odm.c [root@gqas005 ~]# In one more comment will confirm if all the failures are related. Susant Similar observations for file d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h (and rest of the failures) Pasting the logs in case someone wants to take a look. [root@gqas005 ~]# ssh root@gqas005 cat /var/log/glusterfs/butcher-rebalance.log | grep "d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h" root@gqas005's password: [root@gqas005 ~]# ssh root@gqas006 cat /var/log/glusterfs/butcher-rebalance.log | grep "d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h" root@gqas006's password: [2017-08-10 12:31:29.311198] I [dht-rebalance.c:1570:dht_migrate_file] 0-butcher-dht: /d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h: attempting to move from butcher-replicate-9 to butcher-replicate-0 [2017-08-10 12:31:29.315337] E [MSGID: 109023] [dht-rebalance.c:748:__dht_rebalance_create_dst_file] 0-butcher-dht: failed to open /d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h on butcher-replicate-0 [No such file or directory] [2017-08-10 12:31:29.315391] E [MSGID: 0] [dht-rebalance.c:1688:dht_migrate_file] 0-butcher-dht: Create dst failed on - butcher-replicate-0 for file - /d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h [2017-08-10 12:31:29.315958] E [MSGID: 109023] [dht-rebalance.c:2757:gf_defrag_migrate_single_file] 0-butcher-dht: migrate-data failed for /d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h [No such file or directory] [root@gqas005 ~]# [root@gqas005 ~]# [root@gqas005 ~]# [root@gqas005 ~]# ssh root@gqas008 cat /var/log/glusterfs/butcher-rebalance.log | grep "d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h" root@gqas008's password: [root@gqas005 ~]# ssh root@gqas013 cat /var/log/glusterfs/butcher-rebalance.log | grep "d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h" root@gqas013's password: [2017-08-10 12:31:29.287690] I [dht-rebalance.c:1570:dht_migrate_file] 0-butcher-dht: /d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h: attempting to move from butcher-replicate-10 to butcher-replicate-9 [2017-08-10 12:31:29.312523] I [MSGID: 109045] [dht-common.c:2015:dht_lookup_everywhere_cbk] 0-butcher-dht: attempting deletion of stale linkfile /d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h on butcher-replicate-0 (hashed subvol is butcher-replicate-9) [2017-08-10 12:31:29.315078] I [MSGID: 109069] [dht-common.c:1327:dht_lookup_unlink_cbk] 0-butcher-dht: lookup_unlink returned with op_ret -> 0 and op-errno -> 0 for /d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h [2017-08-10 12:31:29.315191] I [MSGID: 109022] [dht-rebalance.c:2199:dht_migrate_file] 0-butcher-dht: completed migration of /d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h from subvolume butcher-replicate-10 to butcher-replicate-9 file is present on rep-9. [root@gqas005 ~]# ssh root@gqas005 ls -l /bricks*/brick/d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h root@gqas005's password: ls: cannot access /bricks*/brick/d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h: No such file or directory [root@gqas005 ~]# ssh root@gqas006 ls -l /bricks*/brick/d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h root@gqas006's password: -rw-rw-r-- 2 root root 478 Aug 6 21:44 /bricks5/brick/d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h [root@gqas005 ~]# ssh root@gqas007 ls -l /bricks*/brick/d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h Warning: Permanently added 'gqas007,10.16.156.18' (ECDSA) to the list of known hosts. root@gqas007's password: ls: cannot access /bricks*/brick/d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h: No such file or directory [root@gqas005 ~]# ssh root@gqas008 ls -l /bricks*/brick/d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h root@gqas008's password: -rw-rw-r-- 2 root root 478 Aug 6 21:44 /bricks5/brick/d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h [root@gqas005 ~]# ssh root@gqas013 ls -l /bricks*/brick/d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h root@gqas013's password: ls: cannot access /bricks*/brick/d1/dir18/linux-4.13-rc4/arch/cris/include/arch-v32/arch/swab.h: No such file or directory [root@gqas005 ~]# More of a theory, can not confirm this as I don't think we have sufficient logs to indicate this. The file got migrated from replicate-1 to replicate-0. first attempt was at 12:21:09.556685 from rep-1 to rep-1 on node-8(gqas008). 2017-08-10 12:21:09.556685] I [dht-rebalance.c:1570:dht_migrate_file] 0-butcher-dht: /d1/dir16/linux-4.13-rc4/drivers/staging/rtl8188eu/hal/odm.c: attempting to move from butcher-replicate-1 to butcher-replicate-0 2nd log about the file is from node-5 which says lookup_every_cbk found the same file on two different nodes which are rep-1 and rep-0. Now (fair assumption)node-5 could have fetched the file from rep-0(it's local subvol for node-5) in a state where the file had 1000 permission but not the xattr making it to think that the file is present on two different nodes. [2017-08-10 12:21:09.628679] W [MSGID: 109007] [dht-common.c:1950:dht_lookup_everywhere_cbk] 0-butcher-dht: multiple subvolumes (butcher-replicate-1 and butcher-replicate-0) have file /d1/dir16/linux-4.13-rc4/drivers/staging/rtl8188eu/hal/odm.c (preferably rename the file in the backend,and do a fresh lookup) Will update if similar behavior is seen for other failures as well. There was an observation here above (comment 7) that the same file got hashed to two different subvols from two different nodes. Point to note is that the layout should have been the same at that point, since the migration was triggered almost at same time. We are not logging the layout information anymore, hence need to reproduce the issue on my dev setup for any conclusion. Will update if I get any evidence. - Susant Tried to reproduce the layout issue we suspected mentioned in comment 14. Created a replicate volume. [root@vm1 glusterfs]# gvi Volume Name: test1 Type: Distributed-Replicate Volume ID: c392466e-aa80-49d6-99ee-dfa8ef372cec Status: Started Snapshot Count: 0 Number of Bricks: 5 x 2 = 10 Transport-type: tcp Bricks: Brick1: vm1:/brick/1 Brick2: vm2:/brick/1 Brick3: vm1:/brick/2 Brick4: vm2:/brick/2 Brick5: vm1:/brick/3 Brick6: vm2:/brick/3 Brick7: vm1:/brick/4 Brick8: vm2:/brick/4 Created two clients with one dir on root, one one vm1, another on vm2 and issued fix-layout. command: setfattr -n "distribute.fix.layout" -v "1" /mnt/dir The layouts remained unchanged. Added a replicate pair to trigger a layout change, even then the layout was not changed post the 1st fix-layout. Note: Even if the layout had not changed in the current bug, having file picked from two subvolume will lead to false negative failure case which is the current bug. Will update at best what should be done from DHT point of view regardless of whether that is intrusive or not. Post that we can take decision on that. - Susant |