Description of problem: ----------------------- While doing remove-brick operation, on a 4X3 volume with heterogenous replica sets, the remove-brick failed with I/O error. Version-Release number of selected component (if applicable): ------------------------------------------------------------ RHGS build info : 6.0-30 RHEL info : Red Hat Enterprise Linux release 8.1 (Ootpa) How reproducible: ---------------- 1/1 Steps to Reproduce: ------------------- 1. Create a 3X3 (all replica sets have 20G bricks) volume and start it. 2. Mount the volume using FUSE. 3. Perform following IO on the mount point (on volume root): # for i in {1..500}; do dd if=/dev/urandom of=file$i bs=100K count=1; chmod 755 file$i; ln file$i hfile$i; setfattr -n user.test -v "foobar" file$i; chmod +t file$i; chmod +s file$i; mv -vf file$i zile$i; done 4. Add-brick to the volume (a replica set with 50G bricks). 5. Triggered rebalance twice : first just rebalance and then rebalance using force. (Note: There was no reason as such for triggering rebalance twice, was just checking the behaviour.) 6. Rebalance completes successfully. 7. Perform following IO on the mount point: # pwd /mnt/vol1 # mkdir dir2 # cd dir2 # for i in {1..20}; do mknod cfile$i c 20 10; done # for i in {1..20}; do mknod bfile$i b 20 10; done # for i in {1..20}; do mknod pfile$i p; done 8. Now, start remove-brick operation on the volume to remove a replica set which consists of 20G bricks. 9. Check remove-brick status. Actual results: --------------- Failures in remove-brick due to: [2020-03-20 05:18:01.543862] E [MSGID: 114031] [client-rpc-fops_v2.c:150:client4_0_mknod_cbk] 0-vol1-client-14: remote operation failed. Path: /hfile4755 [No data available] [2020-03-20 05:33:50.563489] E [MSGID: 114031] [client-rpc-fops_v2.c:150:client4_0_mknod_cbk] 0-vol1-client-14: remote operation failed. Path: /zile1385 [No data available] [2020-03-20 05:33:52.173692] E [dht-helper.c:1863:dht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/6.0/xlator/cluster/distribute.so(+0xf796) [0x7fe83f344796] -->/usr/lib64/glusterfs/6.0/xlator/cluster/distribute.so(+0x3d5d1) [0x7fe83f3725d1] -->/usr/lib64/glusterfs/6.0/xlator/cluster/distribute.so(+0xe63c) [0x7fe83f34363c] ) 0-vol1-dht: invalid argument: stat [Invalid argument] [2020-03-20 05:33:52.174209] E [MSGID: 108008] [afr-transaction.c:2877:afr_write_txn_refresh_done] 0-vol1-replicate-3: Failing REMOVEXATTR on gfid f717aa19-b9e4-4a2b-a9b9-e86f938b2491: split-brain observed. [Input/output error] [2020-03-20 05:33:52.179431] E [MSGID: 108008] [afr-transaction.c:2877:afr_write_txn_refresh_done] 0-vol1-replicate-3: Failing REMOVEXATTR on gfid f717aa19-b9e4-4a2b-a9b9-e86f938b2491: split-brain observed. [Input/output error] [2020-03-20 05:47:35.261384] E [MSGID: 109034] [dht-common.c:1973:dht_lookup_unlink_of_false_linkto_cbk] 0-vol1-dht: Could not unlink the linkto file as either fd is open and/or linkto xattr is set for /zile1808 [Device or resource busy] [2020-03-20 05:47:35.261536] E [MSGID: 109023] [dht-rebalance.c:2751:gf_defrag_migrate_single_file] 0-vol1-dht: Migrate file failed: /zile1808 lookup failed [Input/output error] Expected results: ---------------- There should not be failures due to I/O error in remove-brick. Additional info: ---------------- sos-reports will be shared.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHGS 3.5.z Batch Update 5 glusterfs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3729