Bug 1021441 - [RHSC] Status of bricks that reside on a server which is down, should be shown as down.
Summary: [RHSC] Status of bricks that reside on a server which is down, should be show...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhsc
Version: 2.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: RHGS 2.1.2
Assignee: Sahina Bose
QA Contact: Shruti Sampat
URL:
Whiteboard:
Depends On: 1045374
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-21 09:57 UTC by Shruti Sampat
Modified: 2015-05-13 16:32 UTC (History)
9 users (show)

Fixed In Version: cb11
Doc Type: Bug Fix
Doc Text:
Previously, when the status of a host is down, the status of bricks in that host were displayed as Up. Now, the correct status is displayed.
Clone Of:
Environment:
Last Closed: 2014-02-25 07:41:49 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:0208 0 normal SHIPPED_LIVE Red Hat Storage 2.1 enhancement and bug fix update #2 2014-02-25 12:20:30 UTC
oVirt gerrit 21444 0 None None None Never
oVirt gerrit 21951 0 None None None Never

Description Shruti Sampat 2013-10-21 09:57:38 UTC
Description of problem:
-------------------------
When a storage server in a cluster goes down, the status of bricks that reside on that server, should be shown as down. 

Currently the status of such bricks remains UP if the volume is started.

Version-Release number of selected component (if applicable):
Red Hat Storage Console Version: 2.1.2-0.0.scratch.beta1.el6_4 

How reproducible:
Always

Steps to Reproduce:
1. Create a cluster and add two hosts to it.
2. Create a volume with bricks on both servers.
3. Bring down one of the hosts.

Actual results:
The bricks that reside on the server that is down, are still shown with UP status.

Expected results:
Bricks that are on a server which is down, should be shown as down.

Additional info:

Comment 2 Shruti Sampat 2013-11-26 07:26:57 UTC
Hi Sahina,

I have the following observations - 

1. Power off one server in a cluster of 4 servers - 

The server moves to non-responsive, but the bricks are still shown as UP in the UI.

2. Kill glusterd on one of the servers in a cluster of 4 servers - 

The server moves to non-operational, and the bricks are now shown as DOWN in the UI.

I was expecting the bricks to be shown as DOWN in case 1, as the bricks are actually not usable.

In case 2, even though glusterd is not running on the server, the bricks are still usable. So is it right to show the bricks as DOWN?

Bricks that reside on servers that are down, and on servers where glusterd is not running, are not displayed in the output of 'gluster volume status' command.

For e.g. -

In a cluster of the following 4 servers, 

10.70.37.84 - powered off
10.70.37.132 - glusterd down
10.70.37.64 - up and running
10.70.37.176 - up and running

The following commands were run on 10.70.37.64 - 


[root@rhs ~]# gluster volume info dis_vol
 
Volume Name: dis_vol
Type: Distribute
Volume ID: a7a904f8-b4ca-4ba2-a176-966e4a286fab
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: 10.70.37.84:/rhs/brick1/b1
Brick2: 10.70.37.132:/rhs/brick1/b1
Brick3: 10.70.37.64:/rhs/brick1/b1
Brick4: 10.70.37.176:/rhs/brick1/b1
Options Reconfigured:
auth.allow: *
user.cifs: enable
nfs.disable: off


[root@rhs ~]# gluster volume status dis_vol                                                                                                                                                                    
Status of volume: dis_vol
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.64:/rhs/brick1/b1                        49152   Y       12035
Brick 10.70.37.176:/rhs/brick1/b1                       49152   Y       22190
NFS Server on localhost                                 2049    Y       21872
NFS Server on 10.70.37.176                              2049    Y       31865
 
Task Status of Volume dis_vol
------------------------------------------------------------------------------
There are no active volume tasks

Comment 3 Shruti Sampat 2013-11-26 08:56:42 UTC
After the non-operational host is brought back up by starting glusterd on it, the bricks status is set to UP as part of the periodic sync job, and not immediately, along with the host status changing to UP.

Comment 4 Sahina Bose 2013-12-03 06:43:09 UTC
Case 1 - when host is non-responsive.
In this case, host has moved to non-responsive state due to a network error between engine and host (could be many reasons, for instance - vdsm not running / server powered off etc)

In this case, brick status can be moved to UNKNOWN, as the engine cannot determine status unless the sync job returns brick status from another host. Proposed flow : if host is non-responsive, change brick status to UNKNOWN. If sync job determines the status from another server, the status will be moved to UP/DOWN

Case 2 - When host is non-operational.

The host moves to non-operational due to glusterd not running. In this case, the gluster volume status does not return brick status either and operations on brick like remove-brick, brick advanced details fails and brick is offline for all practical purposes. Hence, moving the brick to status Down seems appropriate.

Comment 5 Sahina Bose 2013-12-03 06:44:07 UTC
Moving this bug to ASSIGNED to take care of Case 1.
Dusmant, please confirm the proposed flow

Comment 6 Dusmant 2013-12-03 18:04:39 UTC
proposed flow is fine.

Comment 7 Shruti Sampat 2013-12-20 10:04:48 UTC
I am observing the following behavior when the steps below are performed - 

1. On a cluster of 4 nodes, power off one, and stop network service on another. This causes both these servers to be in non-responsive state. The bricks residing on these servers are set to '?' ( unknown ) status.

2. Bring the powered-off server back up.

The bricks residing on this server are supposed to come up, but did not. It is found to be because of the "gluster volume status" command failing as per vdsm logs, because of BZ #1045374 .

Will verify this BZ after the above BZ is fixed.

Comment 8 Shruti Sampat 2014-01-03 06:42:49 UTC
Verified as fixed in Red Hat Storage Console Version: 2.1.2-0.30.el6rhs.

Comment 9 Shalaka 2014-01-21 07:17:09 UTC
Please review the edited DocText and signoff.

Comment 10 Sahina Bose 2014-01-21 08:14:16 UTC
Looks ok

Comment 12 errata-xmlrpc 2014-02-25 07:41:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html


Note You need to log in before you can comment on or make changes to this bug.