Bug 1401869

Summary: Rebalance not happened, which triggered after adding couple of bricks.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Byreddy <bsrirama>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED ERRATA QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, ksandha, rcyriac, rhinduja, rhs-bugs, spalai, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-23 05:54:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351528, 1400037    

Description Byreddy 2016-12-06 10:06:17 UTC
Description of problem:
=======================
Rebalance status showed some failures, triggered after adding couple of bricks to 2*2 volume.


# gluster volume rebalance Dis-Rep status 
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             1             1             0            completed        0:0:0
                            10.70.41.217                0        0Bytes             0             0             0            completed        0:0:10
volume rebalance: Dis-Rep: success



Errors in Glusterd Log:
-----------------------
[2016-12-06 09:28:41.210721] E [MSGID: 106062] [glusterd-utils.c:9188:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index
The message "E [MSGID: 106062] [glusterd-utils.c:9188:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index" repeated 2 times between [2016-12-06 09:28:41.210721] and [2016-12-06 09:28:46.241538]

Errors in rebalance log:
------------------------
[2016-12-06 09:28:46.511955] I [MSGID: 109081] [dht-common.c:4006:dht_setxattr] 0-Dis-Rep-dht: fixing the layout of /linux-4.8.8
[2016-12-06 09:28:46.516510] E [dht-rebalance.c:3348:gf_defrag_fix_layout] 0-Dis-Rep-dht: Setxattr failed for /linux-4.8.8
[2016-12-06 09:28:46.525333] I [dht-rebalance.c:3884:gf_defrag_start_crawl] 0-DHT: crawling file-system completed


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-7.el7rhgs.x86_64



How reproducible:
=================
One time

Steps to Reproduce:
===================
1. Have 2 node cluster
2. Create 2 *2 volume 
3. Mount the volume using gnfs (v3) and untar the linux kernel in the mount point
4. Add couple of bricks
5. Trigger the rebalance. // gluster volume rebalance <vol-name> start

Actual results:
===============
Rebalance failed.


Expected results:
=================
Rebalance should happen successfully


Additional info:

Comment 4 Susant Kumar Palai 2016-12-07 09:52:14 UTC
I was able to reproduce the issue. And after adding logs saw that blocking inodelk failing with EAGAIN.

[2016-12-06 12:16:04.329982] E [MSGID: 109118] [dht-helper.c:2081:dht_blocking_inodelk_cbk] 0-test1-dht: inodelk failed with Resource temporarily unavailable on subvol test1-replicate-0 [Res
ource temporarily unavailable]
[2016-12-06 12:16:04.330109] E [dht-rebalance.c:3348:gf_defrag_fix_layout] 0-test1-dht: Setxattr failed for /dir2


There is an issue in AFR where afr on receiving BLOCKING inodelk tries to get non-blocking inodelk. In case of failure it passes the error back to parent translators.

Pranith has already sent the patch for this. http://review.gluster.org/#/c/15984/.

Moving the component to AFR.

Comment 8 Susant Kumar Palai 2016-12-08 06:23:02 UTC
*** Bug 1400037 has been marked as a duplicate of this bug. ***

Comment 10 Nag Pavan Chilakam 2016-12-13 14:37:43 UTC
QATP:
====
added bricks and did a rebalance on 2x2=>3x2 volume while IO is happening
Didn't see any failures
Did even remove bricks and it passed

Ran with gnfs too


hence moving to verified
test version:3.8.4-8

Comment 12 errata-xmlrpc 2017-03-23 05:54:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html