Bug 1572581

Summary: Remove-brick failed on Distributed volume while rm -rf is in-progress
Product: [Community] GlusterFS Reporter: Susant Kumar Palai <spalai>
Component: distributeAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-v4.1.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1572585 (view as bug list) Environment:
Last Closed: 2018-06-20 18:05:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1572585    

Description Susant Kumar Palai 2018-04-27 11:06:48 UTC
Description of problem:
Remove-brick operation fails while rm -rf is inprogress


How reproducible:
Always

Steps to Reproduce:
1. Create data (I used linux untar)
2. Start remove-brick process
3. issue rm -rf * on mount


[root@vm3 upstream]# gvi
 
Volume Name: test1
Type: Distribute
Volume ID: de28535e-1873-429a-a5ef-9dc4814b6b93
Status: Started
Snapshot Count: 0
Number of Bricks: 7
Transport-type: tcp
Bricks:
Brick1: vm3:/extraspace/brick/1
Brick2: vm3:/extraspace/brick/2
Brick3: vm3:/extraspace/brick/3
Brick4: vm3:/extraspace/brick/4
Brick5: vm3:/extraspace/brick/5
Brick6: vm3:/extraspace/brick/6
Brick7: vm3:/extraspace/brick/7
Options Reconfigured:
performance.client-io-threads: on
transport.address-family: inet
nfs.disable: on

Error messages from remove-brick log:
ile-fpga-irq.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.072971] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-3: remote operation failed [No such file or directory]
[2018-04-27 10:35:46.073518] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm6345-l1-intc.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.073661] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2836-l1-intc.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.073802] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-2: remote operation failed [No such file or directory]
[2018-04-27 10:35:46.073874] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-1: remote operation failed [No such file or directory]
[2018-04-27 10:35:46.073900] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-4: remote operation failed [No such file or directory]
[2018-04-27 10:35:46.074660] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-0: remote operation failed [No such file or directory]
[2018-04-27 10:35:46.074979] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2835-armctrl-ic.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.075267] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/arm,vic.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.075768] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/atmel,aic.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.076057] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm7038-l1-intc.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.076779] E [dht-rebalance.c:3497:gf_defrag_settle_hash] 0-test1-dht: fix layout on /linux-4.16/Documentation/devicetree/bindings/media failed
[2018-04-27 10:35:46.076794] E [MSGID: 109110] [dht-rebalance.c:3926:gf_defrag_fix_layout] 0-test1-dht: Settle hash failed for /linux-4.16/Documentation/devicetree/bindings/media
[2018-04-27 10:35:46.076957] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation/devicetree/bindings/media
[2018-04-27 10:35:46.077211] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation/devicetree/bindings
[2018-04-27 10:35:46.077336] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation/devicetree
[2018-04-27 10:35:46.077525] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation
[2018-04-27 10:35:46.077656] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16
[2018-04-27 10:35:46.078032] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/cdns,xtensa-pic.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.078413] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/cirrus,clps711x-intc.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.080317] I [MSGID: 109028] [dht-rebalance.c:5088:gf_defrag_status_get] 0-test1-dht: Rebalance is failed. Time taken is 104.00 secs
[2018-04-27 10:35:46.080337] I [MSGID: 109028] [dht-rebalance.c:5092:gf_defrag_status_get] 0-test1-dht: Files migrated: 729, size: 2475036, lookups: 2179, failures: 10, skipped: 0
[2018-04-27 10:35:46.082286] W [glusterfsd.c:1367:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f8368341e25] -->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xde) [0x40a3cb] -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x88) [0x408875] ) 0-: received signum (15), shutting down

Comment 1 Susant Kumar Palai 2018-04-27 11:08:47 UTC
There are good enough checks in the fix-layout code path to eliminate ENOENT and ESTALE errors. But the same was missing from gf_defrag_settle_hash function. Since the the directory in question is deleted as part of rm -rf *, settle_hash failed.

debug log snippet.
<[2018-04-27 11:00:25.437436] E [dht-rebalance.c:3620:gf_defrag_settle_hash] 0-test1-dht: fix layout on /linux-4.16/Documentation/devicetree/bindings/display/panel failed, error :2>

Comment 2 Worker Ant 2018-04-27 11:14:48 UTC
REVIEW: https://review.gluster.org/19945 (dht: gf_defrag_settle_hash should ignore ENOENT and ESTALE error) posted (#2) for review on master by Susant Palai

Comment 3 Worker Ant 2018-04-30 19:36:30 UTC
COMMIT: https://review.gluster.org/19945 committed in master by "Jeff Darcy" <jeff.us> with a commit message- dht: gf_defrag_settle_hash should ignore ENOENT and ESTALE error

Problem: A directory deletion can happen just before gf_defrag_settle_hash
which internally does a setxattr operation on a directory.

Solution: Ignore ENOENT and ESTALE errors

Fixes: bz#1572581
Change-Id: I2f91809f3b5e02976c4c3a5a596406a8b2f8f6f2
Signed-off-by: Susant Palai <spalai>

Comment 4 Shyamsundar 2018-06-20 18:05:40 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/