Bug 975382

Summary: [RHS-C] Server is moved to "UP" state from "Non-Operational" even though glusterd is stopped
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Prasanth <pprakash>
Component: rhscAssignee: Sahina Bose <sabose>
Status: CLOSED ERRATA QA Contact: Prasanth <pprakash>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1CC: dpati, dtsang, knarra, mmahoney, pprakash, rhs-bugs, sabose, sdharane, sharne, ssampat, tjeyasin
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 2.1.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: cb6 Doc Type: Bug Fix
Doc Text:
Previously, when glusterd service was not running on a host, operations were allowed from Console though it fails on hosts and the status of such hosts was displayed as Up. Now, with this update, the status of glusterd service is checked and the host status is displayed as Non-Operational if the service is not running.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-25 07:31:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Prasanth 2013-06-18 10:23:46 UTC
Description of problem:

Server is moved to "UP" state from "Non-Operational" even though glusterd is stopped

Version-Release number of selected component (if applicable):  Red Hat Storage Console Version: 2.1.0-0.bb3.el6rhs 


How reproducible: Always


Steps to Reproduce:
1. Create a cluster and add 2 servers (make sure that both are in UP state)
2. In server1, stop the glusterd (#/etc/init.d/glusterd stop)
3. In the UI, the status of server1 should now change to "Non-Operational"
4. Now try to activate server 1 from the UI. 

Actual results: You will see that the server is successfully activated and set to UP whereas glusterd is still stopped in that server

-----------
[root@qa-vm05 ~]# /etc/init.d/glusterd status
glusterd is stopped
[root@qa-vm05 ~]# ps aux |grep glusterd
root     18885  0.0  0.5 550748 24668 ?        Ssl  09:34   0:02 /usr/sbin/glusterfsd -s qa-vm05.lab.eng.blr.redhat.com --volfile-id vol2.qa-vm05.lab.eng.blr.redhat.com.home-2 -p /var/lib/glusterd/vols/vol2/run/qa-vm05.lab.eng.blr.redhat.com-home-2.pid -S /var/run/70a79a2a4a4a66bf8adfa15a546ac3b5.socket --brick-name /home/2 -l /var/log/glusterfs/bricks/home-2.log --xlator-option *-posix.glusterd-uuid=005f5084-b648-4981-945c-d9860b216b06 --brick-port 49155 --xlator-option vol2-server.listen-port=49155
root     20559  0.0  0.4 482132 20224 ?        Ssl  09:37   0:02 /usr/sbin/glusterfsd -s qa-vm05.lab.eng.blr.redhat.com --volfile-id vol1.qa-vm05.lab.eng.blr.redhat.com.home-1 -p /var/lib/glusterd/vols/vol1/run/qa-vm05.lab.eng.blr.redhat.com-home-1.pid -S /var/run/362d1fa221732351bdcad17d16c96352.socket --brick-name /home/1 -l /var/log/glusterfs/bricks/home-1.log --xlator-option *-posix.glusterd-uuid=005f5084-b648-4981-945c-d9860b216b06 --brick-port 49156 --xlator-option vol1-server.listen-port=49156
root     21939  0.2  1.6 336768 81024 ?        Ssl  09:41   0:04 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/9bff6f78c2e8572b47d0bff1dbffee53.socket
root     21952  0.2  0.5 325784 26016 ?        Ssl  09:41   0:04 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/371e4830a375abca0a0e78fe047e943e.socket --xlator-option *replicate*.node-uuid=005f5084-b648-4981-945c-d9860b216b06
root     26205  0.0  0.0 103244   836 pts/0    S+   10:17   0:00 grep glusterd
---------------

Event messages:

---------------
2013-Jun-18, 15:43 Detected new Host server1. Host state was set to Up.
2013-Jun-18, 15:43 Host server1 was activated by admin@internal.
2013-Jun-18, 15:40 Status of service type <UNKNOWN> changed from MIXED to STOPPED on cluster mycluster
2013-Jun-18, 15:40 Status of service type <UNKNOWN> changed from MIXED to RUNNING on cluster mycluster
2013-Jun-18, 15:40 Status of service type <UNKNOWN> changed from STOPPED to MIXED on cluster mycluster
2013-Jun-18, 15:40 Status of service type <UNKNOWN> changed from RUNNING to MIXED on cluster mycluster
2013-Jun-18, 15:40 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>.
2013-Jun-18, 15:40 Failed to fetch gluster volume list from server server1.
---------------


Expected results: It shouldn't activate the server when the glusterd is down. Whereas it should indicate the possible reason in the event messages as well.


Additional info: "Activate" should try to restart glusterd in the server and then try to bring the server UP

Comment 2 Prasanth 2013-06-18 11:50:09 UTC
One more point to be noted is: If we don't activate the server manually, the sync job does the same after 5 min and mark the server as UP even though the glusterd is still stopped in that server. But after a few seconds, it goes back to "Non-operational" again and this cycle continues.

Corresponding event messages generated:

------------------------	
Status of service type <UNKNOWN> changed from STOPPED to MIXED on cluster mycluster
2013-Jun-18, 17:15 Status of service type <UNKNOWN> changed from RUNNING to MIXED on cluster mycluster
2013-Jun-18, 17:15 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>.
2013-Jun-18, 17:15 Failed to fetch gluster volume list from server server1.
2013-Jun-18, 17:15 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>.
2013-Jun-18, 17:15 Failed to fetch gluster peer list from server server1 on Cluster <UNKNOWN>.
2013-Jun-18, 17:15 Detected new Host server1. Host state was set to Up.
2013-Jun-18, 17:11 Status of service type <UNKNOWN> changed from MIXED to STOPPED on cluster mycluster
2013-Jun-18, 17:11 Status of service type <UNKNOWN> changed from MIXED to RUNNING on cluster mycluster
2013-Jun-18, 17:10 Status of service type <UNKNOWN> changed from STOPPED to MIXED on cluster mycluster
2013-Jun-18, 17:10 Status of service type <UNKNOWN> changed from RUNNING to MIXED on cluster mycluster
2013-Jun-18, 17:10 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>.
2013-Jun-18, 17:10 Failed to fetch gluster volume list from server server1.
2013-Jun-18, 17:10 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>.
2013-Jun-18, 17:10 Failed to fetch gluster peer list from server server1 on Cluster <UNKNOWN>.
2013-Jun-18, 17:10 Detected new Host server1. Host state was set to Up.
2013-Jun-18, 17:05 Status of service type <UNKNOWN> changed from MIXED to STOPPED on cluster mycluster
2013-Jun-18, 17:05 Status of service type <UNKNOWN> changed from MIXED to RUNNING on cluster mycluster
2013-Jun-18, 17:05 Status of service type <UNKNOWN> changed from STOPPED to MIXED on cluster mycluster
2013-Jun-18, 17:05 Status of service type <UNKNOWN> changed from RUNNING to MIXED on cluster mycluster
2013-Jun-18, 17:05 Status of service glusterd on server server1 changed from RUNNING to STOPPED. Updating in engine now.
2013-Jun-18, 17:05 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>.
2013-Jun-18, 17:05 Failed to fetch gluster volume list from server server1.
2013-Jun-18, 17:05 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>.
2013-Jun-18, 17:05 Failed to fetch gluster peer list from server server1 on Cluster <UNKNOWN>.
------------------------

Comment 3 Sahina Bose 2013-07-02 05:47:42 UTC
Vdsm verb for activate needs to check if glusterd is running, and try to start glusterd on activate.

Comment 4 Matt Mahoney 2013-11-05 15:51:48 UTC
Verified against CB6.

Host stays in Non Operational state, even after attempting to Activate the host while glusterd is stopped.

Comment 5 Shalaka 2014-01-07 09:49:13 UTC
Please review the edited DocText and signoff,

Comment 6 Sahina Bose 2014-01-30 07:06:51 UTC
Have edited doctext

Comment 8 errata-xmlrpc 2014-02-25 07:31:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html