Bug 1572585 - Remove-brick failed on Distributed volume while rm -rf is in-progress
Summary: Remove-brick failed on Distributed volume while rm -rf is in-progress
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: RHGS 3.4.0
Assignee: Susant Kumar Palai
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On: 1572581
Blocks: 1503137
TreeView+ depends on / blocked
 
Reported: 2018-04-27 11:23 UTC by Prasad Desala
Modified: 2018-09-18 06:41 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.12.2-9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1572581
Environment:
Last Closed: 2018-09-04 06:47:18 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 0 None None None 2018-09-04 06:48:40 UTC

Description Prasad Desala 2018-04-27 11:23:16 UTC
+++ This bug was initially created as a clone of Bug #1572581 +++

Description of problem:
Remove-brick operation fails while rm -rf is inprogress


How reproducible:
Always

Steps to Reproduce:
1. Create data (I used linux untar)
2. Start remove-brick process
3. issue rm -rf * on mount


[root@vm3 upstream]# gvi
 
Volume Name: test1
Type: Distribute
Volume ID: de28535e-1873-429a-a5ef-9dc4814b6b93
Status: Started
Snapshot Count: 0
Number of Bricks: 7
Transport-type: tcp
Bricks:
Brick1: vm3:/extraspace/brick/1
Brick2: vm3:/extraspace/brick/2
Brick3: vm3:/extraspace/brick/3
Brick4: vm3:/extraspace/brick/4
Brick5: vm3:/extraspace/brick/5
Brick6: vm3:/extraspace/brick/6
Brick7: vm3:/extraspace/brick/7
Options Reconfigured:
performance.client-io-threads: on
transport.address-family: inet
nfs.disable: on

Error messages from remove-brick log:
ile-fpga-irq.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.072971] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-3: remote operation failed [No such file or directory]
[2018-04-27 10:35:46.073518] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm6345-l1-intc.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.073661] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2836-l1-intc.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.073802] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-2: remote operation failed [No such file or directory]
[2018-04-27 10:35:46.073874] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-1: remote operation failed [No such file or directory]
[2018-04-27 10:35:46.073900] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-4: remote operation failed [No such file or directory]
[2018-04-27 10:35:46.074660] W [MSGID: 114031] [client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-0: remote operation failed [No such file or directory]
[2018-04-27 10:35:46.074979] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2835-armctrl-ic.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.075267] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/arm,vic.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.075768] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/atmel,aic.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.076057] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm7038-l1-intc.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.076779] E [dht-rebalance.c:3497:gf_defrag_settle_hash] 0-test1-dht: fix layout on /linux-4.16/Documentation/devicetree/bindings/media failed
[2018-04-27 10:35:46.076794] E [MSGID: 109110] [dht-rebalance.c:3926:gf_defrag_fix_layout] 0-test1-dht: Settle hash failed for /linux-4.16/Documentation/devicetree/bindings/media
[2018-04-27 10:35:46.076957] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation/devicetree/bindings/media
[2018-04-27 10:35:46.077211] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation/devicetree/bindings
[2018-04-27 10:35:46.077336] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation/devicetree
[2018-04-27 10:35:46.077525] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16/Documentation
[2018-04-27 10:35:46.077656] E [MSGID: 109016] [dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /linux-4.16
[2018-04-27 10:35:46.078032] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/cdns,xtensa-pic.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.078413] E [MSGID: 109023] [dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file failed: /linux-4.16/Documentation/devicetree/bindings/interrupt-controller/cirrus,clps711x-intc.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.080317] I [MSGID: 109028] [dht-rebalance.c:5088:gf_defrag_status_get] 0-test1-dht: Rebalance is failed. Time taken is 104.00 secs
[2018-04-27 10:35:46.080337] I [MSGID: 109028] [dht-rebalance.c:5092:gf_defrag_status_get] 0-test1-dht: Files migrated: 729, size: 2475036, lookups: 2179, failures: 10, skipped: 0
[2018-04-27 10:35:46.082286] W [glusterfsd.c:1367:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f8368341e25] -->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xde) [0x40a3cb] -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x88) [0x408875] ) 0-: received signum (15), shutting down

--- Additional comment from Susant Kumar Palai on 2018-04-27 07:08:47 EDT ---

There are good enough checks in the fix-layout code path to eliminate ENOENT and ESTALE errors. But the same was missing from gf_defrag_settle_hash function. Since the the directory in question is deleted as part of rm -rf *, settle_hash failed.

debug log snippet.
<[2018-04-27 11:00:25.437436] E [dht-rebalance.c:3620:gf_defrag_settle_hash] 0-test1-dht: fix layout on /linux-4.16/Documentation/devicetree/bindings/display/panel failed, error :2>

--- Additional comment from Worker Ant on 2018-04-27 07:14:48 EDT ---

REVIEW: https://review.gluster.org/19945 (dht: gf_defrag_settle_hash should ignore ENOENT and ESTALE error) posted (#2) for review on master by Susant Palai

Comment 6 Prasad Desala 2018-05-15 12:49:53 UTC
Verified this BZ on glusterfs version 3.12.2-9.el7rhgs.x86_64.

Followed the same steps as in the description, remove-brick process on the nodes completed successfully without any failures.

Hence, moving this BZ to Verified.

Comment 8 errata-xmlrpc 2018-09-04 06:47:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607


Note You need to log in before you can comment on or make changes to this bug.