Description of problem: Server is moved to "UP" state from "Non-Operational" even though glusterd is stopped Version-Release number of selected component (if applicable): Red Hat Storage Console Version: 2.1.0-0.bb3.el6rhs How reproducible: Always Steps to Reproduce: 1. Create a cluster and add 2 servers (make sure that both are in UP state) 2. In server1, stop the glusterd (#/etc/init.d/glusterd stop) 3. In the UI, the status of server1 should now change to "Non-Operational" 4. Now try to activate server 1 from the UI. Actual results: You will see that the server is successfully activated and set to UP whereas glusterd is still stopped in that server ----------- [root@qa-vm05 ~]# /etc/init.d/glusterd status glusterd is stopped [root@qa-vm05 ~]# ps aux |grep glusterd root 18885 0.0 0.5 550748 24668 ? Ssl 09:34 0:02 /usr/sbin/glusterfsd -s qa-vm05.lab.eng.blr.redhat.com --volfile-id vol2.qa-vm05.lab.eng.blr.redhat.com.home-2 -p /var/lib/glusterd/vols/vol2/run/qa-vm05.lab.eng.blr.redhat.com-home-2.pid -S /var/run/70a79a2a4a4a66bf8adfa15a546ac3b5.socket --brick-name /home/2 -l /var/log/glusterfs/bricks/home-2.log --xlator-option *-posix.glusterd-uuid=005f5084-b648-4981-945c-d9860b216b06 --brick-port 49155 --xlator-option vol2-server.listen-port=49155 root 20559 0.0 0.4 482132 20224 ? Ssl 09:37 0:02 /usr/sbin/glusterfsd -s qa-vm05.lab.eng.blr.redhat.com --volfile-id vol1.qa-vm05.lab.eng.blr.redhat.com.home-1 -p /var/lib/glusterd/vols/vol1/run/qa-vm05.lab.eng.blr.redhat.com-home-1.pid -S /var/run/362d1fa221732351bdcad17d16c96352.socket --brick-name /home/1 -l /var/log/glusterfs/bricks/home-1.log --xlator-option *-posix.glusterd-uuid=005f5084-b648-4981-945c-d9860b216b06 --brick-port 49156 --xlator-option vol1-server.listen-port=49156 root 21939 0.2 1.6 336768 81024 ? Ssl 09:41 0:04 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/9bff6f78c2e8572b47d0bff1dbffee53.socket root 21952 0.2 0.5 325784 26016 ? Ssl 09:41 0:04 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/371e4830a375abca0a0e78fe047e943e.socket --xlator-option *replicate*.node-uuid=005f5084-b648-4981-945c-d9860b216b06 root 26205 0.0 0.0 103244 836 pts/0 S+ 10:17 0:00 grep glusterd --------------- Event messages: --------------- 2013-Jun-18, 15:43 Detected new Host server1. Host state was set to Up. 2013-Jun-18, 15:43 Host server1 was activated by admin@internal. 2013-Jun-18, 15:40 Status of service type <UNKNOWN> changed from MIXED to STOPPED on cluster mycluster 2013-Jun-18, 15:40 Status of service type <UNKNOWN> changed from MIXED to RUNNING on cluster mycluster 2013-Jun-18, 15:40 Status of service type <UNKNOWN> changed from STOPPED to MIXED on cluster mycluster 2013-Jun-18, 15:40 Status of service type <UNKNOWN> changed from RUNNING to MIXED on cluster mycluster 2013-Jun-18, 15:40 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>. 2013-Jun-18, 15:40 Failed to fetch gluster volume list from server server1. --------------- Expected results: It shouldn't activate the server when the glusterd is down. Whereas it should indicate the possible reason in the event messages as well. Additional info: "Activate" should try to restart glusterd in the server and then try to bring the server UP
One more point to be noted is: If we don't activate the server manually, the sync job does the same after 5 min and mark the server as UP even though the glusterd is still stopped in that server. But after a few seconds, it goes back to "Non-operational" again and this cycle continues. Corresponding event messages generated: ------------------------ Status of service type <UNKNOWN> changed from STOPPED to MIXED on cluster mycluster 2013-Jun-18, 17:15 Status of service type <UNKNOWN> changed from RUNNING to MIXED on cluster mycluster 2013-Jun-18, 17:15 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>. 2013-Jun-18, 17:15 Failed to fetch gluster volume list from server server1. 2013-Jun-18, 17:15 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>. 2013-Jun-18, 17:15 Failed to fetch gluster peer list from server server1 on Cluster <UNKNOWN>. 2013-Jun-18, 17:15 Detected new Host server1. Host state was set to Up. 2013-Jun-18, 17:11 Status of service type <UNKNOWN> changed from MIXED to STOPPED on cluster mycluster 2013-Jun-18, 17:11 Status of service type <UNKNOWN> changed from MIXED to RUNNING on cluster mycluster 2013-Jun-18, 17:10 Status of service type <UNKNOWN> changed from STOPPED to MIXED on cluster mycluster 2013-Jun-18, 17:10 Status of service type <UNKNOWN> changed from RUNNING to MIXED on cluster mycluster 2013-Jun-18, 17:10 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>. 2013-Jun-18, 17:10 Failed to fetch gluster volume list from server server1. 2013-Jun-18, 17:10 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>. 2013-Jun-18, 17:10 Failed to fetch gluster peer list from server server1 on Cluster <UNKNOWN>. 2013-Jun-18, 17:10 Detected new Host server1. Host state was set to Up. 2013-Jun-18, 17:05 Status of service type <UNKNOWN> changed from MIXED to STOPPED on cluster mycluster 2013-Jun-18, 17:05 Status of service type <UNKNOWN> changed from MIXED to RUNNING on cluster mycluster 2013-Jun-18, 17:05 Status of service type <UNKNOWN> changed from STOPPED to MIXED on cluster mycluster 2013-Jun-18, 17:05 Status of service type <UNKNOWN> changed from RUNNING to MIXED on cluster mycluster 2013-Jun-18, 17:05 Status of service glusterd on server server1 changed from RUNNING to STOPPED. Updating in engine now. 2013-Jun-18, 17:05 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>. 2013-Jun-18, 17:05 Failed to fetch gluster volume list from server server1. 2013-Jun-18, 17:05 Gluster command [<UNKNOWN>] failed on server <UNKNOWN>. 2013-Jun-18, 17:05 Failed to fetch gluster peer list from server server1 on Cluster <UNKNOWN>. ------------------------
Vdsm verb for activate needs to check if glusterd is running, and try to start glusterd on activate.
Verified against CB6. Host stays in Non Operational state, even after attempting to Activate the host while glusterd is stopped.
Please review the edited DocText and signoff,
Have edited doctext
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html