Description of problem: ----------------------- Stopped the glusterd service on RHGS/Gluster node. The RHGS/Gluster node moved to non-operation state, and the 'General' tab under that RHGS Node, showed up the action item 'Restart glusterd service'. While clicking on that option - 'Restart glusterd service' - the RHGS/Gluster node was still under non-operational state. glusterd service was actually came up on that RHGS Node Version-Release number of selected component (if applicable): ------------------------------------------------------------- RHV 4.0.2-7 How reproducible: ----------------- Always Steps to Reproduce: ------------------- 1. Add a RHGS/Gluster node to the 3.6 gluster only cluster 2. Stop glusterd service on that node from the backend # systemctl stop glusterd 3. Once the RHGS/Glusterd node moves to non-operational, click on the host and select the action - 'Restart glusterd service' from the 'General' tab Actual results: ---------------- Glusterd service actually came up on that RHGS/Gluster node, but still the UI showed the host as non-operational. After sometime ( around 4 mins ). the node actually came up saying Auto-recovered Expected results: ---------------- The node should come up as glusterd service is restarted Additional info: ---------------- The node autorecovers around 4 mins, after the glusterd service is restarted
Log from engine.log <snip> 2016-08-19 20:06:43,946 WARN [org.ovirt.engine.core.dal.job.ExecutionMessageDirector] (default task-56) [1939713e] The message key 'ManageGlusterService' is missing from 'bundles/ExecutionMessages' 2016-08-19 20:06:43,989 INFO [org.ovirt.engine.core.bll.gluster.ManageGlusterServiceCommand] (default task-56) [1939713e] Before acquiring and wait lock 'EngineLock:{exclusiveLocks='[a0aaf01c-b09a-40d0-ad8d-979 643702e45=<GLUSTER, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2016-08-19 20:06:43,989 INFO [org.ovirt.engine.core.bll.gluster.ManageGlusterServiceCommand] (default task-56) [1939713e] Lock-wait acquired to object 'EngineLock:{exclusiveLocks='[a0aaf01c-b09a-40d0-ad8d-97964 3702e45=<GLUSTER, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2016-08-19 20:06:44,014 INFO [org.ovirt.engine.core.bll.gluster.ManageGlusterServiceCommand] (default task-56) [1939713e] Running command: ManageGlusterServiceCommand internal: false. Entities affected : ID: 6 5b0678a-d103-461b-a0b2-28145452a3ec Type: ClusterAction group MANIPULATE_GLUSTER_SERVICE with role type ADMIN 2016-08-19 20:06:44,023 INFO [org.ovirt.engine.core.vdsbroker.gluster.ManageGlusterServiceVDSCommand] (default task-56) [1939713e] START, ManageGlusterServiceVDSCommand(HostName = RHGS-Node-1, GlusterServiceVDS Parameters:{runAsync='true', hostId='a0aaf01c-b09a-40d0-ad8d-979643702e45'}), log id: 12383b4d 2016-08-19 20:06:44,991 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler1) [] START, GlusterServersListVDSCommand(HostName = dhcp37-187.lab.eng.blr.redhat.com, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='0887ebef-20f0-456d-9f60-f6d467c7027a'}), log id: 53a9b99d 2016-08-19 20:06:46,240 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler1) [] FINISH, GlusterServersListVDSCommand, return: [10.70.37.187/23:CONNECTED, dhcp37- 157.lab.eng.blr.redhat.com:CONNECTED, dhcp37-162.lab.eng.blr.redhat.com:CONNECTED], log id: 53a9b99d 2016-08-19 20:06:46,247 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler1) [] START, GlusterVolumesListVDSCommand(HostName = dhcp37-187.lab.eng.blr.redhat.com, GlusterVolumesListVDSParameters:{runAsync='true', hostId='0887ebef-20f0-456d-9f60-f6d467c7027a'}), log id: 46932c9f 2016-08-19 20:06:46,425 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler1) [] FINISH, GlusterVolumesListVDSCommand, return: {}, log id: 46932c9f 2016-08-19 20:06:46,568 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmsStatisticsFetcher] (DefaultQuartzScheduler10) [] Fetched 0 VMs from VDS '0887ebef-20f0-456d-9f60-f6d467c7027a' 2016-08-19 20:06:47,035 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmsStatisticsFetcher] (DefaultQuartzScheduler5) [5ff01c3b] Fetched 0 VMs from VDS 'a0aaf01c-b09a-40d0-ad8d-979643702e45' 2016-08-19 20:06:48,600 INFO [org.ovirt.engine.core.vdsbroker.gluster.ManageGlusterServiceVDSCommand] (default task-56) [1939713e] FINISH, ManageGlusterServiceVDSCommand, return: [org.ovirt.engine.core.common.b usinessentities.gluster.GlusterServerService@de213865], log id: 12383b4d 2016-08-19 20:06:48,638 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-56) [1939713e] Correlation ID: 1939713e, Job ID: dc7075de-cccb-42be-be94-957abeb86768, Call Stac k: null, Custom Event ID: -1, Message: GLUSTER service re-started on host RHGS-Node-1 of cluster RHGS_Cluster_1. 2016-08-19 20:06:48,646 INFO [org.ovirt.engine.core.bll.gluster.ManageGlusterServiceCommand] (default task-56) [1939713e] Lock freed to object 'EngineLock:{exclusiveLocks='[a0aaf01c-b09a-40d0-ad8d-979643702e45= <GLUSTER, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' </snip>
Another test : 1. Stopped glusterd on one of the RHGS/Gluster node from backend 2. The node was shown as non-operational in RHV UI 3. Leave the node as it is, ** Do not restart glusterd service ** 4. glusterd service was started automatically in another 3 mins, and node was shown as UP from UI
Sahina, is it going to make it to 4.0.6?
(In reply to Yaniv Kaul from comment #3) > Sahina, is it going to make it to 4.0.6? I reckon not, postponing to 4.0.7.
4.0.6 has been the last oVirt 4.0 release, please re-target this bug.
Tested with RHV 4.1.1-6 When the glusterd is stopped and the node was shown UP still, but there appears the prompt to restart glusterd service, under general subtab. When restarting glusterd service from UI, all works well