1672205 – 'gluster get-state' command fails if volume brick doesn't exist.

Bug 1672205 - 'gluster get-state' command fails if volume brick doesn't exist.

Summary: 'gluster get-state' command fails if volume brick doesn't exist.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Sanju
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1669970
Blocks:
TreeView+	depends on / blocked

Reported:	2019-02-04 09:32 UTC by Sanju
Modified:	2020-01-09 17:03 UTC (History)
CC List:	0 users
Fixed In Version:	glusterfs-6.0
Clone Of:	1669970
Environment:
Last Closed:	2019-03-25 16:33:15 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	22147	0	None	Open	glusterd: get-state command should not fail if any brick is gone bad	2019-02-04 10:03:16 UTC

Description Sanju 2019-02-04 09:32:44 UTC

Description of problem:

'gluster get-state' command fails when any brick of a volume is not present or deleted. Instead the command output should report the brick failure.

When any brick of a volume is not available or being removed 'gluster get-state' command fails with the following error:

'Failed to get daemon state. Check glusterd log file for more details'

The requirement is 'gluster get-state' command should not fail and generate gluster brick's state in the output.


For example:

cat /var/run/gluster/glusterd_state_XYZ
...
Volume3.name: v02
Volume3.id: c194e70d-6738-4ba3-9502-ec5603aab679
Volume3.type: Distributed-Replicate
...
## HERE #
Volume3.Brick1.port: N/A or 0 or empty? 
Volume3.Brick1.rdma_port: 0
Volume3.Brick1.port_registered: N/A or 0 or empty?
Volume3.Brick1.status: Failed
Volume3.Brick1.spacefree: N/A or 0 or empty?
Volume3.Brick1.spacetotal: N/A or 0 or empty?
...

This situation can happen in production when a local storage on node is 'broken' or while using heketi with gluster. Volumes are present but bricks are missing.

How reproducible:
Always

Version-Release number of selected component (if applicable): RHGS 3.X

Steps to Reproduce:
1. Delete a brick 
2. Run command 'gluster get-state'


Actual results: 
Command fails with the below message

'Failed to get daemon state. Check glusterd log file for more details'


Expected results:

'gluster get-state'Command should not fail. It should report the faulty brick's state in the output so one can simply identify what is the problem with the volumne. 
'gluster get-state' command should return a message regarding that 'faulty brick'.


--- Additional comment from Atin Mukherjee on 2019-01-28 15:10:36 IST ---

Root cause:

from glusterd_get_state ()

<snip>
            ret = sys_statvfs(brickinfo->path, &brickstat);                     
            if (ret) {                                                          
                gf_msg(this->name, GF_LOG_ERROR, errno, GD_MSG_FILE_OP_FAILED,  
                       "statfs error: %s ", strerror(errno));                   
                goto out;                                                       
            }                                                                   
                                                                                
            memfree = brickstat.f_bfree * brickstat.f_bsize;                    
            memtotal = brickstat.f_blocks * brickstat.f_bsize;                  
                                                                                
            fprintf(fp, "Volume%d.Brick%d.spacefree: %" PRIu64 "Bytes\n",       
                    count_bkp, count, memfree);                                 
            fprintf(fp, "Volume%d.Brick%d.spacetotal: %" PRIu64 "Bytes\n",      
                    count_bkp, count, memtotal);   

</snip>

a statfs call is made on the brick path for every bricks of the volumes to calculate the total vs free space. In this case we shouldn't error out on a statfs failure and should report spacefree and spacetotal as unavailable or 0 bytes.

--- Additional comment from Atin Mukherjee on 2019-02-04 07:59:34 IST ---

We need to have a test coverage to ensure that get-state command should generate an output successfully even if underlying brick(s) of volume(s) in the cluster go bad.

--- Additional comment from sankarshan on 2019-02-04 14:48:30 IST ---

(In reply to Atin Mukherjee from comment #4)
> We need to have a test coverage to ensure that get-state command should
> generate an output successfully even if underlying brick(s) of volume(s) in
> the cluster go bad.

The test coverage flag needs to be set

Comment 1 Worker Ant 2019-02-04 10:03:17 UTC

REVIEW: https://review.gluster.org/22147 (glusterd: get-state command should not fail if any brick is gone bad) posted (#1) for review on master by Sanju Rakonde

Comment 2 Worker Ant 2019-02-05 14:41:05 UTC

COMMIT: https://review.gluster.org/22147 committed in master by "Atin Mukherjee" <amukherj> with a commit message- glusterd: get-state command should not fail if any brick is gone bad

Problem: get-state command will error out, if any of the underlying
brick(s) of volume(s) in the cluster go bad.

It is expected that get-state command should not error out, but
should generate an output successfully.

Solution: In glusterd_get_state(), a statfs call is made on the
brick path for every bricks of the volumes to calculate the total
and free memory available. If any of statfs call fails on any
brick, we should not error out and should report total memory and free
memory of that brick as 0.

This patch also handles a statfs failure scenario in
glusterd_store_retrieve_bricks().

fixes: bz#1672205

Change-Id: Ia9e8a1d8843b65949d72fd6809bd21d39b31ad83
Signed-off-by: Sanju Rakonde <srakonde>

Comment 3 Shyamsundar 2019-03-25 16:33:15 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.