Description of problem: 'gluster get-state' command fails when any brick of a volume is not present or deleted. Instead the command output should report the brick failure. When any brick of a volume is not available or being removed 'gluster get-state' command fails with the following error: 'Failed to get daemon state. Check glusterd log file for more details' The requirement is 'gluster get-state' command should not fail and generate gluster brick's state in the output. For example: cat /var/run/gluster/glusterd_state_XYZ ... Volume3.name: v02 Volume3.id: c194e70d-6738-4ba3-9502-ec5603aab679 Volume3.type: Distributed-Replicate ... ## HERE # Volume3.Brick1.port: N/A or 0 or empty? Volume3.Brick1.rdma_port: 0 Volume3.Brick1.port_registered: N/A or 0 or empty? Volume3.Brick1.status: Failed Volume3.Brick1.spacefree: N/A or 0 or empty? Volume3.Brick1.spacetotal: N/A or 0 or empty? ... This situation can happen in production when a local storage on node is 'broken' or while using heketi with gluster. Volumes are present but bricks are missing. How reproducible: Always Version-Release number of selected component (if applicable): RHGS 3.X Steps to Reproduce: 1. Delete a brick 2. Run command 'gluster get-state' Actual results: Command fails with the below message 'Failed to get daemon state. Check glusterd log file for more details' Expected results: 'gluster get-state'Command should not fail. It should report the faulty brick's state in the output so one can simply identify what is the problem with the volumne. 'gluster get-state' command should return a message regarding that 'faulty brick'. --- Additional comment from Atin Mukherjee on 2019-01-28 15:10:36 IST --- Root cause: from glusterd_get_state () <snip> ret = sys_statvfs(brickinfo->path, &brickstat); if (ret) { gf_msg(this->name, GF_LOG_ERROR, errno, GD_MSG_FILE_OP_FAILED, "statfs error: %s ", strerror(errno)); goto out; } memfree = brickstat.f_bfree * brickstat.f_bsize; memtotal = brickstat.f_blocks * brickstat.f_bsize; fprintf(fp, "Volume%d.Brick%d.spacefree: %" PRIu64 "Bytes\n", count_bkp, count, memfree); fprintf(fp, "Volume%d.Brick%d.spacetotal: %" PRIu64 "Bytes\n", count_bkp, count, memtotal); </snip> a statfs call is made on the brick path for every bricks of the volumes to calculate the total vs free space. In this case we shouldn't error out on a statfs failure and should report spacefree and spacetotal as unavailable or 0 bytes. --- Additional comment from Atin Mukherjee on 2019-02-04 07:59:34 IST --- We need to have a test coverage to ensure that get-state command should generate an output successfully even if underlying brick(s) of volume(s) in the cluster go bad. --- Additional comment from sankarshan on 2019-02-04 14:48:30 IST --- (In reply to Atin Mukherjee from comment #4) > We need to have a test coverage to ensure that get-state command should > generate an output successfully even if underlying brick(s) of volume(s) in > the cluster go bad. The test coverage flag needs to be set
REVIEW: https://review.gluster.org/22147 (glusterd: get-state command should not fail if any brick is gone bad) posted (#1) for review on master by Sanju Rakonde
COMMIT: https://review.gluster.org/22147 committed in master by "Atin Mukherjee" <amukherj> with a commit message- glusterd: get-state command should not fail if any brick is gone bad Problem: get-state command will error out, if any of the underlying brick(s) of volume(s) in the cluster go bad. It is expected that get-state command should not error out, but should generate an output successfully. Solution: In glusterd_get_state(), a statfs call is made on the brick path for every bricks of the volumes to calculate the total and free memory available. If any of statfs call fails on any brick, we should not error out and should report total memory and free memory of that brick as 0. This patch also handles a statfs failure scenario in glusterd_store_retrieve_bricks(). fixes: bz#1672205 Change-Id: Ia9e8a1d8843b65949d72fd6809bd21d39b31ad83 Signed-off-by: Sanju Rakonde <srakonde>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report. glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html [2] https://www.gluster.org/pipermail/gluster-users/