Bug 1821599

Summary:	[RHEL 8.2] Failures in rebalance due to [Input/output error]
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Sayalee <saraut>
Component:	replicate	Assignee:	Ravishankar N <ravishankar>
Status:	CLOSED ERRATA	QA Contact:	Veera Raghava Reddy <vereddy>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.5	CC:	pasik, pprakash, puebele, ravishankar, rhs-bugs, rkothiya, sheggodu, storage-qa-internal
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.5.z Batch Update 4
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-6.0-50	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-04-29 07:20:36 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Sayalee 2020-04-07 07:29:10 UTC

Description of problem:
-----------------------
On a distributed-replicated volume, while heal was still in progress, added new replica set and triggered rebalance. The rebalance faced failures which were due to I/O error.


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHEL info - Red Hat Enterprise Linux release 8.2 (Ootpa)
RHGS build info - 6.0-31


How reproducible:
-----------------
1/1


Steps to Reproduce:
-------------------
1) Created a 2X3 volume.
2) Mount the volume using FUSE and give 777 permissions to the mount
3) Added a new user
4) Login as new user and created 100 files from the new user:
# for i in {1..100}; do dd if=/dev/urandom of=$i bs=1024 count=1; done
5) Kill a brick part of the volume
6) On the mount, login as root user and create 1000 files:
# for i in {1..1000} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done
7) Start the vol using force
8) Start full heal on volume manually.
9) Let the heal complete.
10) Kill another brick part of volume
11) On the mount, login as new user, and copy existing data to the mount:
# cp -r /home/linux-4.6.4 /mnt/vol2/test3/
12) Start volume using force
13) Start full heal on vol manually.
14) While heal is in progress, add-brick and start rebalance
15) Wait for rebalance to complete.


Actual results:
---------------
Failures in rebalance due to -
[2020-04-06 12:20:44.237976] I [dht-rebalance.c:3492:gf_defrag_process_dir] 0-vol2-dht: Migration operation on dir /test3/linux-4.6.4/Documentation/devicetree/bindings/mailbox took 0.00 secs 
[2020-04-06 12:20:44.242722] E [MSGID: 108008] [afr-transaction.c:2877:afr_write_txn_refresh_done] 0-vol2-replicate-0: Failing SETXATTR on gfid 4ec7e2f3-2088-41ce-a177-1438886007ba: split-brain observed. [Input/output error] 
[2020-04-06 12:20:44.244860] E [MSGID: 109016] [dht-rebalance.c:3559:gf_defrag_settle_hash] 0-vol2-dht: fix layout on /test3/linux-4.6.4/Documentation/devicetree/bindings/mailbox failed [Input/output error] 
[2020-04-06 12:20:44.244884] E [MSGID: 109110] [dht-rebalance.c:3991:gf_defrag_fix_layout] 0-vol2-dht: Settle hash failed for /test3/linux-4.6.4/Documentation/devicetree/bindings/mailbox


Expected results:
-----------------
Rebalance should not have any failures due to I/O errors and split-brain.


Additional info:
---------------
sos-reports will be shared.

Comment 31 errata-xmlrpc 2021-04-29 07:20:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1462