Bug 1815462 - [RHEL 8.1] Migration error may appear during remove-brick
Summary: [RHEL 8.1] Migration error may appear during remove-brick
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: RHGS 3.5.z Batch Update 7
Assignee: Tamar Shacked
QA Contact: Pranav Prakash
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-20 10:49 UTC by Sayalee
Modified: 2021-10-05 07:56 UTC (History)
5 users (show)

Fixed In Version: glusterfs-6.0-57
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-05 07:56:26 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1979936 1 low CLOSED Rebalance: [No data available] error may appear in Rebalance/Brick logs 2022-07-08 04:27:24 UTC
Red Hat Product Errata RHBA-2021:3729 0 None None None 2021-10-05 07:56:39 UTC

Internal Links: 1979936

Description Sayalee 2020-03-20 10:49:29 UTC
Description of problem:
-----------------------
While doing remove-brick operation, on a 4X3 volume with heterogenous replica sets, the remove-brick failed with I/O error.

Version-Release number of selected component (if applicable):
------------------------------------------------------------
RHGS build info : 6.0-30
RHEL info : Red Hat Enterprise Linux release 8.1 (Ootpa)

How reproducible:
----------------
1/1

Steps to Reproduce:
-------------------
1. Create a 3X3 (all replica sets have 20G bricks) volume and start it.
2. Mount the volume using FUSE.
3. Perform following IO on the mount point (on volume root):
# for i in {1..500}; do dd if=/dev/urandom of=file$i bs=100K count=1; chmod 755 file$i; ln file$i hfile$i; setfattr -n user.test -v "foobar" file$i; chmod +t file$i; chmod +s file$i; mv -vf file$i zile$i; done
4. Add-brick to the volume (a replica set with 50G bricks).
5. Triggered rebalance twice : first just rebalance and then rebalance using force.
(Note: There was no reason as such for triggering rebalance twice, was just checking the behaviour.)
6. Rebalance completes successfully.
7. Perform following IO on the mount point:
# pwd
/mnt/vol1
# mkdir dir2
# cd dir2
# for i in {1..20}; do mknod cfile$i c 20 10; done
# for i in {1..20}; do mknod bfile$i b 20 10; done
# for i in {1..20}; do mknod pfile$i p; done
8. Now, start remove-brick operation on the volume to remove a replica set which consists of 20G bricks.
9. Check remove-brick status.

Actual results:
---------------
Failures in remove-brick due to:
[2020-03-20 05:18:01.543862] E [MSGID: 114031] [client-rpc-fops_v2.c:150:client4_0_mknod_cbk] 0-vol1-client-14: remote operation failed. Path: /hfile4755 [No data available] 
[2020-03-20 05:33:50.563489] E [MSGID: 114031] [client-rpc-fops_v2.c:150:client4_0_mknod_cbk] 0-vol1-client-14: remote operation failed. Path: /zile1385 [No data available] 
[2020-03-20 05:33:52.173692] E [dht-helper.c:1863:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/6.0/xlator/cluster/distribute.so(+0xf796) [0x7fe83f344796] -->/usr/lib64/glusterfs/6.0/xlator/cluster/distribute.so(+0x3d5d1) [0x7fe83f3725d1] -->/usr/lib64/glusterfs/6.0/xlator/cluster/distribute.so(+0xe63c) [0x7fe83f34363c] ) 0-vol1-dht: invalid argument: stat [Invalid argument] [2020-03-20 05:33:52.174209] E [MSGID: 108008] [afr-transaction.c:2877:afr_write_txn_refresh_done] 0-vol1-replicate-3: Failing REMOVEXATTR on gfid f717aa19-b9e4-4a2b-a9b9-e86f938b2491: split-brain observed. [Input/output error] 
[2020-03-20 05:33:52.179431] E [MSGID: 108008] [afr-transaction.c:2877:afr_write_txn_refresh_done] 0-vol1-replicate-3: Failing REMOVEXATTR on gfid f717aa19-b9e4-4a2b-a9b9-e86f938b2491: split-brain observed. [Input/output error] 
[2020-03-20 05:47:35.261384] E [MSGID: 109034] [dht-common.c:1973:dht_lookup_unlink_of_false_linkto_cbk] 0-vol1-dht: Could not unlink the linkto file as either fd is open and/or linkto xattr is set for /zile1808 [Device or resource busy] 
[2020-03-20 05:47:35.261536] E [MSGID: 109023] [dht-rebalance.c:2751:gf_defrag_migrate_single_file] 0-vol1-dht: Migrate file failed: /zile1808 lookup failed [Input/output error]

Expected results:
----------------
There should not be failures due to I/O error in remove-brick.

Additional info:
----------------
sos-reports will be shared.

Comment 27 errata-xmlrpc 2021-10-05 07:56:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHGS 3.5.z Batch Update 5 glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3729


Note You need to log in before you can comment on or make changes to this bug.