1368487 – RHGS/Gluster node is still in non-operational state, even after restarting the glusterd service from UI

Bug 1368487 - RHGS/Gluster node is still in non-operational state, even after restarting the glusterd service from UI

Summary: RHGS/Gluster node is still in non-operational state, even after restarting th...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Frontend.WebAdmin
Sub Component:
Version:	4.0.2.7
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	ovirt-4.1.1
Target Release:	4.1.1.2
Assignee:	Sahina Bose
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-19 14:33 UTC by SATHEESARAN
Modified:	2017-04-21 09:40 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-04-21 09:40:29 UTC
oVirt Team:	Gluster
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.1+ rule-engine: planning_ack+ sabose: devel_ack+ sasundar: testing_ack+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	69044	0	master	MERGED	engine: Fix restart glusterd service	2017-01-17 06:53:02 UTC
oVirt gerrit	71690	0	ovirt-engine-4.1	MERGED	engine: Fix restart glusterd service	2017-02-09 13:25:32 UTC

Description SATHEESARAN 2016-08-19 14:33:49 UTC

Description of problem:
-----------------------
Stopped the glusterd service on RHGS/Gluster node. The RHGS/Gluster node moved to non-operation state, and the 'General' tab under that RHGS Node, showed up the action item 'Restart glusterd service'.

While clicking on that option - 'Restart glusterd service' - the RHGS/Gluster node was still under non-operational state.

glusterd service was actually came up on that RHGS Node


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHV 4.0.2-7

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Add a RHGS/Gluster node to the 3.6 gluster only cluster
2. Stop glusterd service on that node from the backend
# systemctl stop glusterd
3. Once the RHGS/Glusterd node moves to non-operational, click on the host and select the action - 'Restart glusterd service' from the 'General' tab

Actual results:
----------------
Glusterd service actually came up on that RHGS/Gluster node, but still the UI showed the host as non-operational.

After sometime ( around 4 mins ). the node actually came up saying Auto-recovered

Expected results:
----------------
The node should come up as glusterd service is restarted

Additional info:
----------------
The node autorecovers around 4 mins, after the glusterd service is restarted

Comment 1 SATHEESARAN 2016-08-19 14:35:03 UTC

Log from engine.log

<snip>
2016-08-19 20:06:43,946 WARN  [org.ovirt.engine.core.dal.job.ExecutionMessageDirector] (default task-56) [1939713e] The message key 'ManageGlusterService' is missing from 'bundles/ExecutionMessages'
2016-08-19 20:06:43,989 INFO  [org.ovirt.engine.core.bll.gluster.ManageGlusterServiceCommand] (default task-56) [1939713e] Before acquiring and wait lock 'EngineLock:{exclusiveLocks='[a0aaf01c-b09a-40d0-ad8d-979
643702e45=<GLUSTER, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2016-08-19 20:06:43,989 INFO  [org.ovirt.engine.core.bll.gluster.ManageGlusterServiceCommand] (default task-56) [1939713e] Lock-wait acquired to object 'EngineLock:{exclusiveLocks='[a0aaf01c-b09a-40d0-ad8d-97964
3702e45=<GLUSTER, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2016-08-19 20:06:44,014 INFO  [org.ovirt.engine.core.bll.gluster.ManageGlusterServiceCommand] (default task-56) [1939713e] Running command: ManageGlusterServiceCommand internal: false. Entities affected :  ID: 6
5b0678a-d103-461b-a0b2-28145452a3ec Type: ClusterAction group MANIPULATE_GLUSTER_SERVICE with role type ADMIN
2016-08-19 20:06:44,023 INFO  [org.ovirt.engine.core.vdsbroker.gluster.ManageGlusterServiceVDSCommand] (default task-56) [1939713e] START, ManageGlusterServiceVDSCommand(HostName = RHGS-Node-1, GlusterServiceVDS
Parameters:{runAsync='true', hostId='a0aaf01c-b09a-40d0-ad8d-979643702e45'}), log id: 12383b4d
2016-08-19 20:06:44,991 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler1) [] START, GlusterServersListVDSCommand(HostName = dhcp37-187.lab.eng.blr.redhat.com,
 VdsIdVDSCommandParametersBase:{runAsync='true', hostId='0887ebef-20f0-456d-9f60-f6d467c7027a'}), log id: 53a9b99d
2016-08-19 20:06:46,240 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler1) [] FINISH, GlusterServersListVDSCommand, return: [10.70.37.187/23:CONNECTED, dhcp37-
157.lab.eng.blr.redhat.com:CONNECTED, dhcp37-162.lab.eng.blr.redhat.com:CONNECTED], log id: 53a9b99d
2016-08-19 20:06:46,247 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler1) [] START, GlusterVolumesListVDSCommand(HostName = dhcp37-187.lab.eng.blr.redhat.com,
 GlusterVolumesListVDSParameters:{runAsync='true', hostId='0887ebef-20f0-456d-9f60-f6d467c7027a'}), log id: 46932c9f
2016-08-19 20:06:46,425 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler1) [] FINISH, GlusterVolumesListVDSCommand, return: {}, log id: 46932c9f
2016-08-19 20:06:46,568 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmsStatisticsFetcher] (DefaultQuartzScheduler10) [] Fetched 0 VMs from VDS '0887ebef-20f0-456d-9f60-f6d467c7027a'
2016-08-19 20:06:47,035 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmsStatisticsFetcher] (DefaultQuartzScheduler5) [5ff01c3b] Fetched 0 VMs from VDS 'a0aaf01c-b09a-40d0-ad8d-979643702e45'
2016-08-19 20:06:48,600 INFO  [org.ovirt.engine.core.vdsbroker.gluster.ManageGlusterServiceVDSCommand] (default task-56) [1939713e] FINISH, ManageGlusterServiceVDSCommand, return: [org.ovirt.engine.core.common.b
usinessentities.gluster.GlusterServerService@de213865], log id: 12383b4d
2016-08-19 20:06:48,638 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-56) [1939713e] Correlation ID: 1939713e, Job ID: dc7075de-cccb-42be-be94-957abeb86768, Call Stac
k: null, Custom Event ID: -1, Message: GLUSTER service re-started on host RHGS-Node-1 of cluster RHGS_Cluster_1.
2016-08-19 20:06:48,646 INFO  [org.ovirt.engine.core.bll.gluster.ManageGlusterServiceCommand] (default task-56) [1939713e] Lock freed to object 'EngineLock:{exclusiveLocks='[a0aaf01c-b09a-40d0-ad8d-979643702e45=
<GLUSTER, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'

</snip>

Comment 2 SATHEESARAN 2016-08-20 02:49:52 UTC

Another test :

1. Stopped glusterd on one of the RHGS/Gluster node from backend 
2. The node was shown as non-operational in RHV UI
3. Leave the node as it is, ** Do not restart glusterd service **
4. glusterd service was started automatically in another 3 mins, and node was shown as UP from UI

Comment 3 Yaniv Kaul 2016-12-01 13:20:57 UTC

Sahina, is it going to make it to 4.0.6?

Comment 4 Yaniv Kaul 2016-12-08 14:35:49 UTC

(In reply to Yaniv Kaul from comment #3)
> Sahina, is it going to make it to 4.0.6?

I reckon not, postponing to 4.0.7.

Comment 5 Sandro Bonazzola 2017-01-25 07:56:57 UTC

4.0.6 has been the last oVirt 4.0 release, please re-target this bug.

Comment 6 SATHEESARAN 2017-04-05 10:17:38 UTC

Tested with RHV 4.1.1-6

When the glusterd is stopped and the node was shown UP still, but there appears the prompt to restart glusterd service, under general subtab. When restarting glusterd service from UI, all works well

Note You need to log in before you can comment on or make changes to this bug.