Bug 1262046 - [HC] When glusterd is down - show alert and provide an option to restart glusterd
[HC] When glusterd is down - show alert and provide an option to restart glus...
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: Frontend.WebAdmin (Show other bugs)
---
x86_64 Linux
medium Severity high (vote)
: ovirt-4.0.1
: 4.0.1.1
Assigned To: Sahina Bose
SATHEESARAN
: Improvement, ZStream
Depends On:
Blocks: Gluster-HC-2
  Show dependency treegraph
 
Reported: 2015-09-10 13:26 EDT by SATHEESARAN
Modified: 2016-08-22 02:03 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: In a hyperconverged cluster, monitor glusterd status and indicate an alert on host if glusterd is not running. An option to restart glusterd is provided in case it is not running on host. Reason: Allow users to monitor gluster status on HC setup Result: As expected.
Story Points: ---
Clone Of:
Environment:
RHEL 7.1 + gluster Hyperconverged environment
Last Closed: 2016-08-22 02:03:36 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Gluster
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.0.z+
rule-engine: planning_ack+
sabose: devel_ack+
sasundar: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 58265 master MERGED engine: Add gluster peer status 2016-06-20 12:43 EDT
oVirt gerrit 59403 master MERGED engine: Alert for disconnected gluster peer 2016-06-22 01:05 EDT
oVirt gerrit 59583 ovirt-engine-4.0 MERGED engine: Add gluster peer status 2016-06-29 07:49 EDT
oVirt gerrit 59584 ovirt-engine-4.0 MERGED engine: moved gluster methods from ClusterUtils 2016-06-29 07:49 EDT
oVirt gerrit 59585 ovirt-engine-4.0 MERGED engine: Alert for disconnected gluster peer 2016-06-29 07:49 EDT

  None (edit)
Description SATHEESARAN 2015-09-10 13:26:36 EDT
Description of problem:
-----------------------
In a cluster with both virt service and gluster service enabled, when glusterd ( management daemon ) goes down, the node is expected to go non-operational.

But this happens only when the first node goes down. But for the other nodes in the cluster, the nodes are shown operational, even when glusterds are down

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
gluster mainline ( 3.8dev )
RHEVM 3.5.4

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Install glusterfs server on the hypervisors
2. Add these hypervisors to the cluster with virt-service and gluster-service enabled
3. Stop glusterd on the node ( not on the first node that was added to the cluster )

Actual results:
---------------
The hypervisors where glusterd were down, were still shown as operational

Expected results:
-----------------
The hypervisors should be non-operational when glusterd was down
Comment 1 Ramesh N 2015-10-15 01:48:23 EDT
Looks like this was done deliberately to avoid necessary VM migrations/pause when glusterd stopped in a node where both gluster and virt services are running. Also node will not be removed from Cluster when user has run peer detach in gluster CLI. We have to find out a better way to handle this case.

  I feel moving the node to non-operational is not a right solution as it will pause/migrate the VMs. May be we can give a warning symbol in the host also we can raise an alert saying glusterd stopped in the node. 

  Also we have to find out a way to manage gluster peer detach in hyper converged use case.
Comment 2 SATHEESARAN 2015-10-15 05:43:03 EDT
(In reply to Ramesh N from comment #1)
> Looks like this was done deliberately to avoid necessary VM migrations/pause
> when glusterd stopped in a node where both gluster and virt services are
> running. Also node will not be removed from Cluster when user has run peer
> detach in gluster CLI. We have to find out a better way to handle this case.
> 
>   I feel moving the node to non-operational is not a right solution as it
> will pause/migrate the VMs. May be we can give a warning symbol in the host
> also we can raise an alert saying glusterd stopped in the node. 
> 
>   Also we have to find out a way to manage gluster peer detach in hyper
> converged use case.

Thanks for that information.

I agree to the fact that marking the converged host as non-operational, when glusterd goes down, would result in unwanted VM migrations.

Its good to have some form of representation or indication when glusterd was down.

I don't understand your statement saying - "Also we have to find out a way to manage gluster peer detach in hyper converged use case.". What is the problem around this ?
Comment 3 Red Hat Bugzilla Rules Engine 2015-12-10 21:19:38 EST
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.
Comment 4 Sahina Bose 2016-03-31 05:36:30 EDT
We should try to restart glusterd when it's crashed via systemd, and if it still fails to restart, move host to Non-Operational
Comment 5 Ramesh N 2016-06-20 05:46:59 EDT
(In reply to SATHEESARAN from comment #2)
> (In reply to Ramesh N from comment #1)
> > Looks like this was done deliberately to avoid necessary VM migrations/pause
> > when glusterd stopped in a node where both gluster and virt services are
> > running. Also node will not be removed from Cluster when user has run peer
> > detach in gluster CLI. We have to find out a better way to handle this case.
> > 
> >   I feel moving the node to non-operational is not a right solution as it
> > will pause/migrate the VMs. May be we can give a warning symbol in the host
> > also we can raise an alert saying glusterd stopped in the node. 
> > 
> >   Also we have to find out a way to manage gluster peer detach in hyper
> > converged use case.
> 
> Thanks for that information.
> 
> I agree to the fact that marking the converged host as non-operational, when
> glusterd goes down, would result in unwanted VM migrations.
> 
> Its good to have some form of representation or indication when glusterd was
> down.
> 
> I don't understand your statement saying - "Also we have to find out a way
> to manage gluster peer detach in hyper converged use case.". What is the
> problem around this?


I was talking about, what should happen to a HC host when it is removed from gluster cluster using 'gluster peer detach'. Now we have decided to shown an alert when a host is detached from gluster cluster.
Comment 6 Sahina Bose 2016-06-20 05:54:24 EDT
Changed the title to indicate what the bug addresses =
When a gluster host is detected as Disconnected or detached, an alert is shown in the UI for hosts that have virt+gluster service is running. The host is not moved to Non-operational in such cases to prevent VM migrations.

An option is provided in the UI to restart glusterd -in case of disconnected hosts.
Comment 7 Pavel Stehlik 2016-08-22 02:03:36 EDT
Closing due to capacity reasons, if still happen, please reopen.

Note You need to log in before you can comment on or make changes to this bug.