Bug 975382 - [RHS-C] Server is moved to "UP" state from "Non-Operational" even though glusterd is stopped
[RHS-C] Server is moved to "UP" state from "Non-Operational" even though glus...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhsc (Show other bugs)
2.1
x86_64 Linux
medium Severity medium
: ---
: RHGS 2.1.2
Assigned To: Sahina Bose
Prasanth
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-18 06:23 EDT by Prasanth
Modified: 2015-05-15 14:35 EDT (History)
11 users (show)

See Also:
Fixed In Version: cb6
Doc Type: Bug Fix
Doc Text:
Previously, when glusterd service was not running on a host, operations were allowed from Console though it fails on hosts and the status of such hosts was displayed as Up. Now, with this update, the status of glusterd service is checked and the host status is displayed as Non-Operational if the service is not running.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-02-25 02:31:21 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 16898 None None None Never
oVirt gerrit 21081 None None None Never

  None (edit)
Description Prasanth 2013-06-18 06:23:46 EDT
Description of problem:

Server is moved to "UP" state from "Non-Operational" even though glusterd is stopped

Version-Release number of selected component (if applicable):  Red Hat Storage Console Version: 2.1.0-0.bb3.el6rhs 


How reproducible: Always


Steps to Reproduce:
1. Create a cluster and add 2 servers (make sure that both are in UP state)
2. In server1, stop the glusterd (#/etc/init.d/glusterd stop)
3. In the UI, the status of server1 should now change to "Non-Operational"
4. Now try to activate server 1 from the UI. 

Actual results: You will see that the server is successfully activated and set to UP whereas glusterd is still stopped in that server

-----------
[root@qa-vm05 ~]# /etc/init.d/glusterd status
glusterd is stopped
[root@qa-vm05 ~]# ps aux |grep glusterd
root     18885  0.0  0.5 550748 24668 ?        Ssl  09:34   0:02 /usr/sbin/glusterfsd -s qa-vm05.lab.eng.blr.redhat.com --volfile-id vol2.qa-vm05.lab.eng.blr.redhat.com.home-2 -p /var/lib/glusterd/vols/vol2/run/qa-vm05.lab.eng.blr.redhat.com-home-2.pid -S /var/run/70a79a2a4a4a66bf8adfa15a546ac3b5.socket --brick-name /home/2 -l /var/log/glusterfs/bricks/home-2.log --xlator-option *-posix.glusterd-uuid=005f5084-b648-4981-945c-d9860b216b06 --brick-port 49155 --xlator-option vol2-server.listen-port=49155
root     20559  0.0  0.4 482132 20224 ?        Ssl  09:37   0:02 /usr/sbin/glusterfsd -s qa-vm05.lab.eng.blr.redhat.com --volfile-id vol1.qa-vm05.lab.eng.blr.redhat.com.home-1 -p /var/lib/glusterd/vols/vol1/run/qa-vm05.lab.eng.blr.redhat.com-home-1.pid -S /var/run/362d1fa221732351bdcad17d16c96352.socket --brick-name /home/1 -l /var/log/glusterfs/bricks/home-1.log --xlator-option *-posix.glusterd-uuid=005f5084-b648-4981-945c-d9860b216b06 --brick-port 49156 --xlator-option vol1-server.listen-port=49156
root     21939  0.2  1.6 336768 81024 ?        Ssl  09:41   0:04 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/9bff6f78c2e8572b47d0bff1dbffee53.socket
root     21952  0.2  0.5 325784 26016 ?        Ssl  09:41   0:04 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/371e4830a375abca0a0e78fe047e943e.socket --xlator-option *replicate*.node-uuid=005f5084-b648-4981-945c-d9860b216b06
root     26205  0.0  0.0 103244   836 pts/0    S+   10:17   0:00 grep glusterd
---------------

Event messages:

---------------
2013-Jun-18, 15:43 Detected new Host server1. Host state was set to Up.
2013-Jun-18, 15:43 Host server1 was activated by admin@internal.
2013-Jun-18, 15:40 Status of service type <UNKNOWN> changed from MIXED to STOPPED on cluster mycluster
2013-Jun-18, 15:40 Status of service type <UNKNOWN> changed from MIXED to RUNNING on cluster mycluster
2013-Jun-18, 15:40 Status of service type <UNKNOWN> changed from STOPPED to MIXED on cluster mycluster
2013-Jun-18, 15:40 Status of service type <UNKNOWN> changed from RUNNING to MIXED on cluster mycluster
2013-Jun-18, 15:40 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>.
2013-Jun-18, 15:40 Failed to fetch gluster volume list from server server1.
---------------


Expected results: It shouldn't activate the server when the glusterd is down. Whereas it should indicate the possible reason in the event messages as well.


Additional info: "Activate" should try to restart glusterd in the server and then try to bring the server UP
Comment 2 Prasanth 2013-06-18 07:50:09 EDT
One more point to be noted is: If we don't activate the server manually, the sync job does the same after 5 min and mark the server as UP even though the glusterd is still stopped in that server. But after a few seconds, it goes back to "Non-operational" again and this cycle continues.

Corresponding event messages generated:

------------------------	
Status of service type <UNKNOWN> changed from STOPPED to MIXED on cluster mycluster
2013-Jun-18, 17:15 Status of service type <UNKNOWN> changed from RUNNING to MIXED on cluster mycluster
2013-Jun-18, 17:15 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>.
2013-Jun-18, 17:15 Failed to fetch gluster volume list from server server1.
2013-Jun-18, 17:15 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>.
2013-Jun-18, 17:15 Failed to fetch gluster peer list from server server1 on Cluster <UNKNOWN>.
2013-Jun-18, 17:15 Detected new Host server1. Host state was set to Up.
2013-Jun-18, 17:11 Status of service type <UNKNOWN> changed from MIXED to STOPPED on cluster mycluster
2013-Jun-18, 17:11 Status of service type <UNKNOWN> changed from MIXED to RUNNING on cluster mycluster
2013-Jun-18, 17:10 Status of service type <UNKNOWN> changed from STOPPED to MIXED on cluster mycluster
2013-Jun-18, 17:10 Status of service type <UNKNOWN> changed from RUNNING to MIXED on cluster mycluster
2013-Jun-18, 17:10 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>.
2013-Jun-18, 17:10 Failed to fetch gluster volume list from server server1.
2013-Jun-18, 17:10 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>.
2013-Jun-18, 17:10 Failed to fetch gluster peer list from server server1 on Cluster <UNKNOWN>.
2013-Jun-18, 17:10 Detected new Host server1. Host state was set to Up.
2013-Jun-18, 17:05 Status of service type <UNKNOWN> changed from MIXED to STOPPED on cluster mycluster
2013-Jun-18, 17:05 Status of service type <UNKNOWN> changed from MIXED to RUNNING on cluster mycluster
2013-Jun-18, 17:05 Status of service type <UNKNOWN> changed from STOPPED to MIXED on cluster mycluster
2013-Jun-18, 17:05 Status of service type <UNKNOWN> changed from RUNNING to MIXED on cluster mycluster
2013-Jun-18, 17:05 Status of service glusterd on server server1 changed from RUNNING to STOPPED. Updating in engine now.
2013-Jun-18, 17:05 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>.
2013-Jun-18, 17:05 Failed to fetch gluster volume list from server server1.
2013-Jun-18, 17:05 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>.
2013-Jun-18, 17:05 Failed to fetch gluster peer list from server server1 on Cluster <UNKNOWN>.
------------------------
Comment 3 Sahina Bose 2013-07-02 01:47:42 EDT
Vdsm verb for activate needs to check if glusterd is running, and try to start glusterd on activate.
Comment 4 Matt Mahoney 2013-11-05 10:51:48 EST
Verified against CB6.

Host stays in Non Operational state, even after attempting to Activate the host while glusterd is stopped.
Comment 5 Shalaka 2014-01-07 04:49:13 EST
Please review the edited DocText and signoff,
Comment 6 Sahina Bose 2014-01-30 02:06:51 EST
Have edited doctext
Comment 8 errata-xmlrpc 2014-02-25 02:31:21 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html

Note You need to log in before you can comment on or make changes to this bug.