+++ This bug was initially created as a clone of Bug #1572581 +++ Description of problem: Remove-brick operation fails while rm -rf is inprogress How reproducible: Always Steps to Reproduce: 1. Create data (I used linux untar) 2. Start remove-brick process 3. issue rm -rf * on mount [root@vm3 upstream]# gvi Volume Name: test1 Type: Distribute Volume ID: de28535e-1873-429a-a5ef-9dc4814b6b93 Status: Started Snapshot Count: 0 Number of Bricks: 7 Transport-type: tcp Bricks: Brick1: vm3:/extraspace/brick/1 Brick2: vm3:/extraspace/brick/2 Brick3: vm3:/extraspace/brick/3 Brick4: vm3:/extraspace/brick/4 Brick5: vm3:/extraspace/brick/5 Brick6: vm3:/extraspace/brick/6 Brick7: vm3:/extraspace/brick/7 Options Reconfigured: performance.client-io-threads: on transport.address-family: inet nfs.disable: on Error messages from remove-brick log: ile-fpga-irq.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.072971] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-3: remote operation failed [No such file or directory] [2018-04-27 10:35:46.073518] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm6345-l1-intc.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.073661] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2836-l1-intc.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.073802] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-2: remote operation failed [No such file or directory] [2018-04-27 10:35:46.073874] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-1: remote operation failed [No such file or directory] [2018-04-27 10:35:46.073900] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-4: remote operation failed [No such file or directory] [2018-04-27 10:35:46.074660] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-0: remote operation failed [No such file or directory] [2018-04-27 10:35:46.074979] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2835-armctrl-ic.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.075267] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/arm,vic.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.075768] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/atmel,aic.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.076057] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm7038-l1-intc.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.076779] E [dht-rebalance.c:3497:gf_defrag_settle_hash] 0-test1-dht: fix layout on /linux-4.16/Documentation/devicetree/bindings/media failed [2018-04-27 10:35:46.076794] E [MSGID: 109110] [dht-rebalance.c:3926:gf_defrag_fix_layout] 0-test1-dht: Settle hash failed for /linux-4.16/Documentation/devicetree/bindings/media [2018-04-27 10:35:46.076957] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation/devicetree/bindings/media [2018-04-27 10:35:46.077211] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation/devicetree/bindings [2018-04-27 10:35:46.077336] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation/devicetree [2018-04-27 10:35:46.077525] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation [2018-04-27 10:35:46.077656] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16 [2018-04-27 10:35:46.078032] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/cdns,xtensa-pic.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.078413] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/cirrus,clps711x-intc.txt lookup failed [Stale file handle] [2018-04-27 10:35:46.080317] I [MSGID: 109028] [dht-rebalance.c:5088:gf_defrag_status_get] 0-test1-dht: Rebalance is failed. Time taken is 104.00 secs [2018-04-27 10:35:46.080337] I [MSGID: 109028] [dht-rebalance.c:5092:gf_defrag_status_get] 0-test1-dht: Files migrated: 729, size: 2475036, lookups: 2179, failures: 10, skipped: 0 [2018-04-27 10:35:46.082286] W [glusterfsd.c:1367:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f8368341e25] -->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xde) [0x40a3cb] -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x88) [0x408875] ) 0-: received signum (15), shutting down --- Additional comment from Susant Kumar Palai on 2018-04-27 07:08:47 EDT --- There are good enough checks in the fix-layout code path to eliminate ENOENT and ESTALE errors. But the same was missing from gf_defrag_settle_hash function. Since the the directory in question is deleted as part of rm -rf *, settle_hash failed. debug log snippet. <[2018-04-27 11:00:25.437436] E [dht-rebalance.c:3620:gf_defrag_settle_hash] 0-test1-dht: fix layout on /linux-4.16/Documentation/devicetree/bindings/display/panel failed, error :2> --- Additional comment from Worker Ant on 2018-04-27 07:14:48 EDT --- REVIEW: https://review.gluster.org/19945 (dht: gf_defrag_settle_hash should ignore ENOENT and ESTALE error) posted (#2) for review on master by Susant Palai
Verified this BZ on glusterfs version 3.12.2-9.el7rhgs.x86_64. Followed the same steps as in the description, remove-brick process on the nodes completed successfully without any failures. Hence, moving this BZ to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607