Bug 1821599 - [RHEL 8.2] Failures in rebalance due to [Input/output error]
Summary: [RHEL 8.2] Failures in rebalance due to [Input/output error]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.5.z Batch Update 4
Assignee: Ravishankar N
QA Contact: Veera Raghava Reddy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-07 07:29 UTC by Sayalee
Modified: 2021-04-29 07:20 UTC (History)
8 users (show)

Fixed In Version: glusterfs-6.0-50
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-29 07:20:36 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:1462 0 None None None 2021-04-29 07:20:53 UTC

Description Sayalee 2020-04-07 07:29:10 UTC
Description of problem:
-----------------------
On a distributed-replicated volume, while heal was still in progress, added new replica set and triggered rebalance. The rebalance faced failures which were due to I/O error.


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHEL info - Red Hat Enterprise Linux release 8.2 (Ootpa)
RHGS build info - 6.0-31


How reproducible:
-----------------
1/1


Steps to Reproduce:
-------------------
1) Created a 2X3 volume.
2) Mount the volume using FUSE and give 777 permissions to the mount
3) Added a new user
4) Login as new user and created 100 files from the new user:
# for i in {1..100}; do dd if=/dev/urandom of=$i bs=1024 count=1; done
5) Kill a brick part of the volume
6) On the mount, login as root user and create 1000 files:
# for i in {1..1000} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done
7) Start the vol using force
8) Start full heal on volume manually.
9) Let the heal complete.
10) Kill another brick part of volume
11) On the mount, login as new user, and copy existing data to the mount:
# cp -r /home/linux-4.6.4 /mnt/vol2/test3/
12) Start volume using force
13) Start full heal on vol manually.
14) While heal is in progress, add-brick and start rebalance
15) Wait for rebalance to complete.


Actual results:
---------------
Failures in rebalance due to -
[2020-04-06 12:20:44.237976] I [dht-rebalance.c:3492:gf_defrag_process_dir] 0-vol2-dht: Migration operation on dir /test3/linux-4.6.4/Documentation/devicetree/bindings/mailbox took 0.00 secs 
[2020-04-06 12:20:44.242722] E [MSGID: 108008] [afr-transaction.c:2877:afr_write_txn_refresh_done] 0-vol2-replicate-0: Failing SETXATTR on gfid 4ec7e2f3-2088-41ce-a177-1438886007ba: split-brain observed. [Input/output error] 
[2020-04-06 12:20:44.244860] E [MSGID: 109016] [dht-rebalance.c:3559:gf_defrag_settle_hash] 0-vol2-dht: fix layout on /test3/linux-4.6.4/Documentation/devicetree/bindings/mailbox failed [Input/output error] 
[2020-04-06 12:20:44.244884] E [MSGID: 109110] [dht-rebalance.c:3991:gf_defrag_fix_layout] 0-vol2-dht: Settle hash failed for /test3/linux-4.6.4/Documentation/devicetree/bindings/mailbox


Expected results:
-----------------
Rebalance should not have any failures due to I/O errors and split-brain.


Additional info:
---------------
sos-reports will be shared.

Comment 31 errata-xmlrpc 2021-04-29 07:20:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1462


Note You need to log in before you can comment on or make changes to this bug.