Bug 1401869

Summary:	Rebalance not happened, which triggered after adding couple of bricks.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Byreddy <bsrirama>
Component:	replicate	Assignee:	Pranith Kumar K <pkarampu>
Status:	CLOSED ERRATA	QA Contact:	Nag Pavan Chilakam <nchilaka>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.2	CC:	amukherj, ksandha, rcyriac, rhinduja, rhs-bugs, spalai, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.2.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-8	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-03-23 05:54:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1351528, 1400037

Description Byreddy 2016-12-06 10:06:17 UTC

Description of problem:
=======================
Rebalance status showed some failures, triggered after adding couple of bricks to 2*2 volume.


# gluster volume rebalance Dis-Rep status 
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             1             1             0            completed        0:0:0
                            10.70.41.217                0        0Bytes             0             0             0            completed        0:0:10
volume rebalance: Dis-Rep: success



Errors in Glusterd Log:
-----------------------
[2016-12-06 09:28:41.210721] E [MSGID: 106062] [glusterd-utils.c:9188:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index
The message "E [MSGID: 106062] [glusterd-utils.c:9188:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index" repeated 2 times between [2016-12-06 09:28:41.210721] and [2016-12-06 09:28:46.241538]

Errors in rebalance log:
------------------------
[2016-12-06 09:28:46.511955] I [MSGID: 109081] [dht-common.c:4006:dht_setxattr] 0-Dis-Rep-dht: fixing the layout of /linux-4.8.8
[2016-12-06 09:28:46.516510] E [dht-rebalance.c:3348:gf_defrag_fix_layout] 0-Dis-Rep-dht: Setxattr failed for /linux-4.8.8
[2016-12-06 09:28:46.525333] I [dht-rebalance.c:3884:gf_defrag_start_crawl] 0-DHT: crawling file-system completed


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-7.el7rhgs.x86_64



How reproducible:
=================
One time

Steps to Reproduce:
===================
1. Have 2 node cluster
2. Create 2 *2 volume 
3. Mount the volume using gnfs (v3) and untar the linux kernel in the mount point
4. Add couple of bricks
5. Trigger the rebalance. // gluster volume rebalance <vol-name> start

Actual results:
===============
Rebalance failed.


Expected results:
=================
Rebalance should happen successfully


Additional info:

Comment 4 Susant Kumar Palai 2016-12-07 09:52:14 UTC

I was able to reproduce the issue. And after adding logs saw that blocking inodelk failing with EAGAIN.

[2016-12-06 12:16:04.329982] E [MSGID: 109118] [dht-helper.c:2081:dht_blocking_inodelk_cbk] 0-test1-dht: inodelk failed with Resource temporarily unavailable on subvol test1-replicate-0 [Res
ource temporarily unavailable]
[2016-12-06 12:16:04.330109] E [dht-rebalance.c:3348:gf_defrag_fix_layout] 0-test1-dht: Setxattr failed for /dir2


There is an issue in AFR where afr on receiving BLOCKING inodelk tries to get non-blocking inodelk. In case of failure it passes the error back to parent translators.

Pranith has already sent the patch for this. http://review.gluster.org/#/c/15984/.

Moving the component to AFR.

Comment 8 Susant Kumar Palai 2016-12-08 06:23:02 UTC

*** Bug 1400037 has been marked as a duplicate of this bug. ***

Comment 10 Nag Pavan Chilakam 2016-12-13 14:37:43 UTC

QATP:
====
added bricks and did a rebalance on 2x2=>3x2 volume while IO is happening
Didn't see any failures
Did even remove bricks and it passed

Ran with gnfs too


hence moving to verified
test version:3.8.4-8

Comment 12 errata-xmlrpc 2017-03-23 05:54:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html