Bug 1368487

Summary: RHGS/Gluster node is still in non-operational state, even after restarting the glusterd service from UI
Product: [oVirt] ovirt-engine Reporter: SATHEESARAN <sasundar>
Component: Frontend.WebAdminAssignee: Sahina Bose <sabose>
Status: CLOSED CURRENTRELEASE QA Contact: SATHEESARAN <sasundar>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0.2.7CC: bugs, sabose
Target Milestone: ovirt-4.1.1Flags: rule-engine: ovirt-4.1+
rule-engine: planning_ack+
sabose: devel_ack+
sasundar: testing_ack+
Target Release: 4.1.1.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-21 09:40:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Gluster RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description SATHEESARAN 2016-08-19 14:33:49 UTC
Description of problem:
-----------------------
Stopped the glusterd service on RHGS/Gluster node. The RHGS/Gluster node moved to non-operation state, and the 'General' tab under that RHGS Node, showed up the action item 'Restart glusterd service'.

While clicking on that option - 'Restart glusterd service' - the RHGS/Gluster node was still under non-operational state.

glusterd service was actually came up on that RHGS Node


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHV 4.0.2-7

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Add a RHGS/Gluster node to the 3.6 gluster only cluster
2. Stop glusterd service on that node from the backend
# systemctl stop glusterd
3. Once the RHGS/Glusterd node moves to non-operational, click on the host and select the action - 'Restart glusterd service' from the 'General' tab

Actual results:
----------------
Glusterd service actually came up on that RHGS/Gluster node, but still the UI showed the host as non-operational.

After sometime ( around 4 mins ). the node actually came up saying Auto-recovered

Expected results:
----------------
The node should come up as glusterd service is restarted

Additional info:
----------------
The node autorecovers around 4 mins, after the glusterd service is restarted

Comment 1 SATHEESARAN 2016-08-19 14:35:03 UTC
Log from engine.log

<snip>
2016-08-19 20:06:43,946 WARN  [org.ovirt.engine.core.dal.job.ExecutionMessageDirector] (default task-56) [1939713e] The message key 'ManageGlusterService' is missing from 'bundles/ExecutionMessages'
2016-08-19 20:06:43,989 INFO  [org.ovirt.engine.core.bll.gluster.ManageGlusterServiceCommand] (default task-56) [1939713e] Before acquiring and wait lock 'EngineLock:{exclusiveLocks='[a0aaf01c-b09a-40d0-ad8d-979
643702e45=<GLUSTER, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2016-08-19 20:06:43,989 INFO  [org.ovirt.engine.core.bll.gluster.ManageGlusterServiceCommand] (default task-56) [1939713e] Lock-wait acquired to object 'EngineLock:{exclusiveLocks='[a0aaf01c-b09a-40d0-ad8d-97964
3702e45=<GLUSTER, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2016-08-19 20:06:44,014 INFO  [org.ovirt.engine.core.bll.gluster.ManageGlusterServiceCommand] (default task-56) [1939713e] Running command: ManageGlusterServiceCommand internal: false. Entities affected :  ID: 6
5b0678a-d103-461b-a0b2-28145452a3ec Type: ClusterAction group MANIPULATE_GLUSTER_SERVICE with role type ADMIN
2016-08-19 20:06:44,023 INFO  [org.ovirt.engine.core.vdsbroker.gluster.ManageGlusterServiceVDSCommand] (default task-56) [1939713e] START, ManageGlusterServiceVDSCommand(HostName = RHGS-Node-1, GlusterServiceVDS
Parameters:{runAsync='true', hostId='a0aaf01c-b09a-40d0-ad8d-979643702e45'}), log id: 12383b4d
2016-08-19 20:06:44,991 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler1) [] START, GlusterServersListVDSCommand(HostName = dhcp37-187.lab.eng.blr.redhat.com,
 VdsIdVDSCommandParametersBase:{runAsync='true', hostId='0887ebef-20f0-456d-9f60-f6d467c7027a'}), log id: 53a9b99d
2016-08-19 20:06:46,240 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler1) [] FINISH, GlusterServersListVDSCommand, return: [10.70.37.187/23:CONNECTED, dhcp37-
157.lab.eng.blr.redhat.com:CONNECTED, dhcp37-162.lab.eng.blr.redhat.com:CONNECTED], log id: 53a9b99d
2016-08-19 20:06:46,247 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler1) [] START, GlusterVolumesListVDSCommand(HostName = dhcp37-187.lab.eng.blr.redhat.com,
 GlusterVolumesListVDSParameters:{runAsync='true', hostId='0887ebef-20f0-456d-9f60-f6d467c7027a'}), log id: 46932c9f
2016-08-19 20:06:46,425 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler1) [] FINISH, GlusterVolumesListVDSCommand, return: {}, log id: 46932c9f
2016-08-19 20:06:46,568 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmsStatisticsFetcher] (DefaultQuartzScheduler10) [] Fetched 0 VMs from VDS '0887ebef-20f0-456d-9f60-f6d467c7027a'
2016-08-19 20:06:47,035 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmsStatisticsFetcher] (DefaultQuartzScheduler5) [5ff01c3b] Fetched 0 VMs from VDS 'a0aaf01c-b09a-40d0-ad8d-979643702e45'
2016-08-19 20:06:48,600 INFO  [org.ovirt.engine.core.vdsbroker.gluster.ManageGlusterServiceVDSCommand] (default task-56) [1939713e] FINISH, ManageGlusterServiceVDSCommand, return: [org.ovirt.engine.core.common.b
usinessentities.gluster.GlusterServerService@de213865], log id: 12383b4d
2016-08-19 20:06:48,638 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-56) [1939713e] Correlation ID: 1939713e, Job ID: dc7075de-cccb-42be-be94-957abeb86768, Call Stac
k: null, Custom Event ID: -1, Message: GLUSTER service re-started on host RHGS-Node-1 of cluster RHGS_Cluster_1.
2016-08-19 20:06:48,646 INFO  [org.ovirt.engine.core.bll.gluster.ManageGlusterServiceCommand] (default task-56) [1939713e] Lock freed to object 'EngineLock:{exclusiveLocks='[a0aaf01c-b09a-40d0-ad8d-979643702e45=
<GLUSTER, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'

</snip>

Comment 2 SATHEESARAN 2016-08-20 02:49:52 UTC
Another test :

1. Stopped glusterd on one of the RHGS/Gluster node from backend 
2. The node was shown as non-operational in RHV UI
3. Leave the node as it is, ** Do not restart glusterd service **
4. glusterd service was started automatically in another 3 mins, and node was shown as UP from UI

Comment 3 Yaniv Kaul 2016-12-01 13:20:57 UTC
Sahina, is it going to make it to 4.0.6?

Comment 4 Yaniv Kaul 2016-12-08 14:35:49 UTC
(In reply to Yaniv Kaul from comment #3)
> Sahina, is it going to make it to 4.0.6?

I reckon not, postponing to 4.0.7.

Comment 5 Sandro Bonazzola 2017-01-25 07:56:57 UTC
4.0.6 has been the last oVirt 4.0 release, please re-target this bug.

Comment 6 SATHEESARAN 2017-04-05 10:17:38 UTC
Tested with RHV 4.1.1-6

When the glusterd is stopped and the node was shown UP still, but there appears the prompt to restart glusterd service, under general subtab. When restarting glusterd service from UI, all works well