Description of problem: ======================= Seeing error "Failed to get the total number of files. Unable to estimate time to complete rebalance" in rebalance logs Version-Release number of selected component (if applicable): 3.8.4-24.el7rhgs.x86_64 How reproducible: 1/1 Steps to Reproduce: =================== 1) Create a distributed-replicate volume and start it. 2) FUSE mount on a client. 3) Add few bricks and start rebalance. (This error is seen during remove-brick as well) Note: This error says it is unable to estimate the time to complete rebalance but we are able to see the rebalance estimate time during add-brick or remove-brick rebalance. Actual results: =============== Seeing error "Failed to get the total number of files. Unable to estimate time to complete rebalance" in rebalance logs Expected results: ================= We should not see error "Failed to get the total number of files. Unable to estimate time to complete rebalance" in rebalance logs.
This is a bug: In dht-rebalance.c: ret = gf_defrag_total_file_cnt (this, &loc); if (!ret) { gf_msg (this->name, GF_LOG_ERROR, 0, 0, "Failed to get " "the total number of files. Unable to estimate " "time to complete rebalance."); } This should be if (ret) { ... }
REVIEW: https://review.gluster.org/17197 (cluster/dht Fix ret check) posted (#1) for review on master by N Balachandran (nbalacha)
Verified this BZ on glusterfs version 3.8.4-27.el7rhgs.x86_64. After the fix, we are not seeing errors "Failed to get the total number of files. Unable to estimate time to complete rebalance" in rebalance logs.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774