Description of problem: ======================= Rebalance status showed some failures, triggered after adding couple of bricks to 2*2 volume. # gluster volume rebalance Dis-Rep status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 1 1 0 completed 0:0:0 10.70.41.217 0 0Bytes 0 0 0 completed 0:0:10 volume rebalance: Dis-Rep: success Errors in Glusterd Log: ----------------------- [2016-12-06 09:28:41.210721] E [MSGID: 106062] [glusterd-utils.c:9188:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index The message "E [MSGID: 106062] [glusterd-utils.c:9188:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd: failed to get index" repeated 2 times between [2016-12-06 09:28:41.210721] and [2016-12-06 09:28:46.241538] Errors in rebalance log: ------------------------ [2016-12-06 09:28:46.511955] I [MSGID: 109081] [dht-common.c:4006:dht_setxattr] 0-Dis-Rep-dht: fixing the layout of /linux-4.8.8 [2016-12-06 09:28:46.516510] E [dht-rebalance.c:3348:gf_defrag_fix_layout] 0-Dis-Rep-dht: Setxattr failed for /linux-4.8.8 [2016-12-06 09:28:46.525333] I [dht-rebalance.c:3884:gf_defrag_start_crawl] 0-DHT: crawling file-system completed Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.8.4-7.el7rhgs.x86_64 How reproducible: ================= One time Steps to Reproduce: =================== 1. Have 2 node cluster 2. Create 2 *2 volume 3. Mount the volume using gnfs (v3) and untar the linux kernel in the mount point 4. Add couple of bricks 5. Trigger the rebalance. // gluster volume rebalance <vol-name> start Actual results: =============== Rebalance failed. Expected results: ================= Rebalance should happen successfully Additional info:
I was able to reproduce the issue. And after adding logs saw that blocking inodelk failing with EAGAIN. [2016-12-06 12:16:04.329982] E [MSGID: 109118] [dht-helper.c:2081:dht_blocking_inodelk_cbk] 0-test1-dht: inodelk failed with Resource temporarily unavailable on subvol test1-replicate-0 [Res ource temporarily unavailable] [2016-12-06 12:16:04.330109] E [dht-rebalance.c:3348:gf_defrag_fix_layout] 0-test1-dht: Setxattr failed for /dir2 There is an issue in AFR where afr on receiving BLOCKING inodelk tries to get non-blocking inodelk. In case of failure it passes the error back to parent translators. Pranith has already sent the patch for this. http://review.gluster.org/#/c/15984/. Moving the component to AFR.
*** Bug 1400037 has been marked as a duplicate of this bug. ***
QATP: ==== added bricks and did a rebalance on 2x2=>3x2 volume while IO is happening Didn't see any failures Did even remove bricks and it passed Ran with gnfs too hence moving to verified test version:3.8.4-8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html