1401869 – Rebalance not happened, which triggered after adding couple of bricks.

Bug 1401869 - Rebalance not happened, which triggered after adding couple of bricks.

Summary: Rebalance not happened, which triggered after adding couple of bricks.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Pranith Kumar K
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1351528 1400037
TreeView+	depends on / blocked

Reported:	2016-12-06 10:06 UTC by Byreddy
Modified:	2017-03-23 05:54 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.8.4-8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-03-23 05:54:49 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Byreddy 2016-12-06 10:06:17 UTC

Description of problem:
=======================
Rebalance status showed some failures, triggered after adding couple of bricks to 2*2 volume.


# gluster volume rebalance Dis-Rep status 
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             1             1             0            completed        0:0:0
                            10.70.41.217                0        0Bytes             0             0             0            completed        0:0:10
volume rebalance: Dis-Rep: success



Errors in Glusterd Log:
-----------------------
[2016-12-06 09:28:41.210721] E [MSGID: 106062] [glusterd-utils.c:9188:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index
The message "E [MSGID: 106062] [glusterd-utils.c:9188:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index" repeated 2 times between [2016-12-06 09:28:41.210721] and [2016-12-06 09:28:46.241538]

Errors in rebalance log:
------------------------
[2016-12-06 09:28:46.511955] I [MSGID: 109081] [dht-common.c:4006:dht_setxattr] 0-Dis-Rep-dht: fixing the layout of /linux-4.8.8
[2016-12-06 09:28:46.516510] E [dht-rebalance.c:3348:gf_defrag_fix_layout] 0-Dis-Rep-dht: Setxattr failed for /linux-4.8.8
[2016-12-06 09:28:46.525333] I [dht-rebalance.c:3884:gf_defrag_start_crawl] 0-DHT: crawling file-system completed


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-7.el7rhgs.x86_64



How reproducible:
=================
One time

Steps to Reproduce:
===================
1. Have 2 node cluster
2. Create 2 *2 volume 
3. Mount the volume using gnfs (v3) and untar the linux kernel in the mount point
4. Add couple of bricks
5. Trigger the rebalance. // gluster volume rebalance <vol-name> start

Actual results:
===============
Rebalance failed.


Expected results:
=================
Rebalance should happen successfully


Additional info:

Comment 4 Susant Kumar Palai 2016-12-07 09:52:14 UTC

I was able to reproduce the issue. And after adding logs saw that blocking inodelk failing with EAGAIN.

[2016-12-06 12:16:04.329982] E [MSGID: 109118] [dht-helper.c:2081:dht_blocking_inodelk_cbk] 0-test1-dht: inodelk failed with Resource temporarily unavailable on subvol test1-replicate-0 [Res
ource temporarily unavailable]
[2016-12-06 12:16:04.330109] E [dht-rebalance.c:3348:gf_defrag_fix_layout] 0-test1-dht: Setxattr failed for /dir2


There is an issue in AFR where afr on receiving BLOCKING inodelk tries to get non-blocking inodelk. In case of failure it passes the error back to parent translators.

Pranith has already sent the patch for this. http://review.gluster.org/#/c/15984/.

Moving the component to AFR.

Comment 8 Susant Kumar Palai 2016-12-08 06:23:02 UTC

*** Bug 1400037 has been marked as a duplicate of this bug. ***

Comment 10 Nag Pavan Chilakam 2016-12-13 14:37:43 UTC

QATP:
====
added bricks and did a rebalance on 2x2=>3x2 volume while IO is happening
Didn't see any failures
Did even remove bricks and it passed

Ran with gnfs too


hence moving to verified
test version:3.8.4-8

Comment 12 errata-xmlrpc 2017-03-23 05:54:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.