Description of problem: Remove-brick operation fails while rm -rf is inprogress How reproducible: Always Steps to Reproduce: 1. Create data (I used linux untar) 2. Start remove-brick process 3. issue rm -rf * on mount [root@vm3 upstream]# gvi Volume Name: test1 Type: Distribute Volume ID: de28535e-1873-429a-a5ef-9dc4814b6b93 Status: Started Snapshot Count: 0 Number of Bricks: 7 Transport-type: tcp Bricks: Brick1: vm3:/extraspace/brick/1 Brick2: vm3:/extraspace/brick/2 Brick3: vm3:/extraspace/brick/3 Brick4: vm3:/extraspace/brick/4 Brick5: vm3:/extraspace/brick/5 Brick6: vm3:/extraspace/brick/6 Brick7: vm3:/extraspace/brick/7 Options Reconfigured: performance.client-io-threads: on transport.address-family: inet nfs.disable: on Error messages from remove-brick log: ile-fpga-irq.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.072971] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-3: remote operation failed [No such file or directory] [2018-04-27 10:35:46.073518] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm6345-l1-intc.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.073661] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2836-l1-intc.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.073802] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-2: remote operation failed [No such file or directory] [2018-04-27 10:35:46.073874] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-1: remote operation failed [No such file or directory] [2018-04-27 10:35:46.073900] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-4: remote operation failed [No such file or directory] [2018-04-27 10:35:46.074660] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-0: remote operation failed [No such file or directory] [2018-04-27 10:35:46.074979] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2835-armctrl-ic.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.075267] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/arm,vic.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.075768] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/atmel,aic.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.076057] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm7038-l1-intc.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.076779] E [dht-rebalance.c:3497:gf_defrag_settle_hash] 0-test1-dht: fix layout on /linux-4.16/Documentation/devicetree/bindings/media failed [2018-04-27 10:35:46.076794] E [MSGID: 109110] [dht-rebalance.c:3926:gf_defrag_fix_layout] 0-test1-dht: Settle hash failed for /linux-4.16/Documentation/devicetree/bindings/media [2018-04-27 10:35:46.076957] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation/devicetree/bindings/media [2018-04-27 10:35:46.077211] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation/devicetree/bindings [2018-04-27 10:35:46.077336] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation/devicetree [2018-04-27 10:35:46.077525] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation [2018-04-27 10:35:46.077656] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16 [2018-04-27 10:35:46.078032] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/cdns,xtensa-pic.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.078413] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/cirrus,clps711x-intc.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.080317] I [MSGID: 109028] [dht-rebalance.c:5088:gf_defrag_status_get] 0-test1-dht: Rebalance is failed. Time taken is 104.00 secs [2018-04-27 10:35:46.080337] I [MSGID: 109028] [dht-rebalance.c:5092:gf_defrag_status_get] 0-test1-dht: Files migrated: 729, size: 2475036, lookups: 2179, failures: 10, skipped: 0 [2018-04-27 10:35:46.082286] W [glusterfsd.c:1367:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f8368341e25] -->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xde) [0x40a3cb] -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x88) [0x408875] ) 0-: received signum (15), shutting down
There are good enough checks in the fix-layout code path to eliminate ENOENT and ESTALE errors. But the same was missing from gf_defrag_settle_hash function. Since the the directory in question is deleted as part of rm -rf *, settle_hash failed. debug log snippet. <[2018-04-27 11:00:25.437436] E [dht-rebalance.c:3620:gf_defrag_settle_hash] 0-test1-dht: fix layout on /linux-4.16/Documentation/devicetree/bindings/display/panel failed, error :2>
REVIEW: https://review.gluster.org/19945 (dht: gf_defrag_settle_hash should ignore ENOENT and ESTALE error) posted (#2) for review on master by Susant Palai
COMMIT: https://review.gluster.org/19945 committed in master by "Jeff Darcy" <jeff.us> with a commit message- dht: gf_defrag_settle_hash should ignore ENOENT and ESTALE error Problem: A directory deletion can happen just before gf_defrag_settle_hash which internally does a setxattr operation on a directory. Solution: Ignore ENOENT and ESTALE errors Fixes: bz#1572581 Change-Id: I2f91809f3b5e02976c4c3a5a596406a8b2f8f6f2 Signed-off-by: Susant Palai <spalai>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report. glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html [2] https://www.gluster.org/pipermail/gluster-users/