Bug 1262046 - [HC] When glusterd is down - show alert and provide an option to restart glusterd
Summary: [HC] When glusterd is down - show alert and provide an option to restart glus...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Frontend.WebAdmin
Version: ---
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ovirt-4.0.1
: 4.0.1.1
Assignee: Sahina Bose
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks: Gluster-HC-2
TreeView+ depends on / blocked
 
Reported: 2015-09-10 17:26 UTC by SATHEESARAN
Modified: 2016-08-22 06:03 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: In a hyperconverged cluster, monitor glusterd status and indicate an alert on host if glusterd is not running. An option to restart glusterd is provided in case it is not running on host. Reason: Allow users to monitor gluster status on HC setup Result: As expected.
Clone Of:
Environment:
RHEL 7.1 + gluster Hyperconverged environment
Last Closed: 2016-08-22 06:03:36 UTC
oVirt Team: Gluster
Embargoed:
rule-engine: ovirt-4.0.z+
rule-engine: planning_ack+
sabose: devel_ack+
sasundar: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 58265 0 master MERGED engine: Add gluster peer status 2016-06-20 16:43:19 UTC
oVirt gerrit 59403 0 master MERGED engine: Alert for disconnected gluster peer 2016-06-22 05:05:14 UTC
oVirt gerrit 59583 0 ovirt-engine-4.0 MERGED engine: Add gluster peer status 2016-06-29 11:49:11 UTC
oVirt gerrit 59584 0 ovirt-engine-4.0 MERGED engine: moved gluster methods from ClusterUtils 2016-06-29 11:49:40 UTC
oVirt gerrit 59585 0 ovirt-engine-4.0 MERGED engine: Alert for disconnected gluster peer 2016-06-29 11:49:27 UTC

Description SATHEESARAN 2015-09-10 17:26:36 UTC
Description of problem:
-----------------------
In a cluster with both virt service and gluster service enabled, when glusterd ( management daemon ) goes down, the node is expected to go non-operational.

But this happens only when the first node goes down. But for the other nodes in the cluster, the nodes are shown operational, even when glusterds are down

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
gluster mainline ( 3.8dev )
RHEVM 3.5.4

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Install glusterfs server on the hypervisors
2. Add these hypervisors to the cluster with virt-service and gluster-service enabled
3. Stop glusterd on the node ( not on the first node that was added to the cluster )

Actual results:
---------------
The hypervisors where glusterd were down, were still shown as operational

Expected results:
-----------------
The hypervisors should be non-operational when glusterd was down

Comment 1 Ramesh N 2015-10-15 05:48:23 UTC
Looks like this was done deliberately to avoid necessary VM migrations/pause when glusterd stopped in a node where both gluster and virt services are running. Also node will not be removed from Cluster when user has run peer detach in gluster CLI. We have to find out a better way to handle this case.

  I feel moving the node to non-operational is not a right solution as it will pause/migrate the VMs. May be we can give a warning symbol in the host also we can raise an alert saying glusterd stopped in the node. 

  Also we have to find out a way to manage gluster peer detach in hyper converged use case.

Comment 2 SATHEESARAN 2015-10-15 09:43:03 UTC
(In reply to Ramesh N from comment #1)
> Looks like this was done deliberately to avoid necessary VM migrations/pause
> when glusterd stopped in a node where both gluster and virt services are
> running. Also node will not be removed from Cluster when user has run peer
> detach in gluster CLI. We have to find out a better way to handle this case.
> 
>   I feel moving the node to non-operational is not a right solution as it
> will pause/migrate the VMs. May be we can give a warning symbol in the host
> also we can raise an alert saying glusterd stopped in the node. 
> 
>   Also we have to find out a way to manage gluster peer detach in hyper
> converged use case.

Thanks for that information.

I agree to the fact that marking the converged host as non-operational, when glusterd goes down, would result in unwanted VM migrations.

Its good to have some form of representation or indication when glusterd was down.

I don't understand your statement saying - "Also we have to find out a way to manage gluster peer detach in hyper converged use case.". What is the problem around this ?

Comment 3 Red Hat Bugzilla Rules Engine 2015-12-11 02:19:38 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 4 Sahina Bose 2016-03-31 09:36:30 UTC
We should try to restart glusterd when it's crashed via systemd, and if it still fails to restart, move host to Non-Operational

Comment 5 Ramesh N 2016-06-20 09:46:59 UTC
(In reply to SATHEESARAN from comment #2)
> (In reply to Ramesh N from comment #1)
> > Looks like this was done deliberately to avoid necessary VM migrations/pause
> > when glusterd stopped in a node where both gluster and virt services are
> > running. Also node will not be removed from Cluster when user has run peer
> > detach in gluster CLI. We have to find out a better way to handle this case.
> > 
> >   I feel moving the node to non-operational is not a right solution as it
> > will pause/migrate the VMs. May be we can give a warning symbol in the host
> > also we can raise an alert saying glusterd stopped in the node. 
> > 
> >   Also we have to find out a way to manage gluster peer detach in hyper
> > converged use case.
> 
> Thanks for that information.
> 
> I agree to the fact that marking the converged host as non-operational, when
> glusterd goes down, would result in unwanted VM migrations.
> 
> Its good to have some form of representation or indication when glusterd was
> down.
> 
> I don't understand your statement saying - "Also we have to find out a way
> to manage gluster peer detach in hyper converged use case.". What is the
> problem around this?


I was talking about, what should happen to a HC host when it is removed from gluster cluster using 'gluster peer detach'. Now we have decided to shown an alert when a host is detached from gluster cluster.

Comment 6 Sahina Bose 2016-06-20 09:54:24 UTC
Changed the title to indicate what the bug addresses =
When a gluster host is detected as Disconnected or detached, an alert is shown in the UI for hosts that have virt+gluster service is running. The host is not moved to Non-operational in such cases to prevent VM migrations.

An option is provided in the UI to restart glusterd -in case of disconnected hosts.

Comment 7 Pavel Stehlik 2016-08-22 06:03:36 UTC
Closing due to capacity reasons, if still happen, please reopen.


Note You need to log in before you can comment on or make changes to this bug.