Description of problem: ----------------------- On a distributed-replicated volume, while heal was still in progress, added new replica set and triggered rebalance. The rebalance faced failures which were due to I/O error. Version-Release number of selected component (if applicable): ------------------------------------------------------------- RHEL info - Red Hat Enterprise Linux release 8.2 (Ootpa) RHGS build info - 6.0-31 How reproducible: ----------------- 1/1 Steps to Reproduce: ------------------- 1) Created a 2X3 volume. 2) Mount the volume using FUSE and give 777 permissions to the mount 3) Added a new user 4) Login as new user and created 100 files from the new user: # for i in {1..100}; do dd if=/dev/urandom of=$i bs=1024 count=1; done 5) Kill a brick part of the volume 6) On the mount, login as root user and create 1000 files: # for i in {1..1000} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done 7) Start the vol using force 8) Start full heal on volume manually. 9) Let the heal complete. 10) Kill another brick part of volume 11) On the mount, login as new user, and copy existing data to the mount: # cp -r /home/linux-4.6.4 /mnt/vol2/test3/ 12) Start volume using force 13) Start full heal on vol manually. 14) While heal is in progress, add-brick and start rebalance 15) Wait for rebalance to complete. Actual results: --------------- Failures in rebalance due to - [2020-04-06 12:20:44.237976] I [dht-rebalance.c:3492:gf_defrag_process_dir] 0-vol2-dht: Migration operation on dir /test3/linux-4.6.4/Documentation/devicetree/bindings/mailbox took 0.00 secs [2020-04-06 12:20:44.242722] E [MSGID: 108008] [afr-transaction.c:2877:afr_write_txn_refresh_done] 0-vol2-replicate-0: Failing SETXATTR on gfid 4ec7e2f3-2088-41ce-a177-1438886007ba: split-brain observed. [Input/output error] [2020-04-06 12:20:44.244860] E [MSGID: 109016] [dht-rebalance.c:3559:gf_defrag_settle_hash] 0-vol2-dht: fix layout on /test3/linux-4.6.4/Documentation/devicetree/bindings/mailbox failed [Input/output error] [2020-04-06 12:20:44.244884] E [MSGID: 109110] [dht-rebalance.c:3991:gf_defrag_fix_layout] 0-vol2-dht: Settle hash failed for /test3/linux-4.6.4/Documentation/devicetree/bindings/mailbox Expected results: ----------------- Rebalance should not have any failures due to I/O errors and split-brain. Additional info: --------------- sos-reports will be shared.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1462