Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1262046

Summary:	[HC] When glusterd is down - show alert and provide an option to restart glusterd
Product:	[oVirt] ovirt-engine	Reporter:	SATHEESARAN <sasundar>
Component:	Frontend.WebAdmin	Assignee:	Sahina Bose <sabose>
Status:	CLOSED CURRENTRELEASE	QA Contact:	SATHEESARAN <sasundar>
Severity:	high	Docs Contact:
Priority:	medium
Version:	---	CC:	bugs, pstehlik, rnachimu, sabose, sasundar, ykaul
Target Milestone:	ovirt-4.0.1	Keywords:	Improvement, ZStream
Target Release:	4.0.1.1	Flags:	rule-engine: ovirt-4.0.z+ rule-engine: planning_ack+ sabose: devel_ack+ sasundar: testing_ack+
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:	Feature: In a hyperconverged cluster, monitor glusterd status and indicate an alert on host if glusterd is not running. An option to restart glusterd is provided in case it is not running on host. Reason: Allow users to monitor gluster status on HC setup Result: As expected.	Story Points:	---
Clone Of:		Environment:	RHEL 7.1 + gluster Hyperconverged environment
Last Closed:	2016-08-22 06:03:36 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Gluster	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1277939

Description SATHEESARAN 2015-09-10 17:26:36 UTC

Description of problem:
-----------------------
In a cluster with both virt service and gluster service enabled, when glusterd ( management daemon ) goes down, the node is expected to go non-operational.

But this happens only when the first node goes down. But for the other nodes in the cluster, the nodes are shown operational, even when glusterds are down

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
gluster mainline ( 3.8dev )
RHEVM 3.5.4

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Install glusterfs server on the hypervisors
2. Add these hypervisors to the cluster with virt-service and gluster-service enabled
3. Stop glusterd on the node ( not on the first node that was added to the cluster )

Actual results:
---------------
The hypervisors where glusterd were down, were still shown as operational

Expected results:
-----------------
The hypervisors should be non-operational when glusterd was down

Comment 1 Ramesh N 2015-10-15 05:48:23 UTC

Looks like this was done deliberately to avoid necessary VM migrations/pause when glusterd stopped in a node where both gluster and virt services are running. Also node will not be removed from Cluster when user has run peer detach in gluster CLI. We have to find out a better way to handle this case.

  I feel moving the node to non-operational is not a right solution as it will pause/migrate the VMs. May be we can give a warning symbol in the host also we can raise an alert saying glusterd stopped in the node. 

  Also we have to find out a way to manage gluster peer detach in hyper converged use case.

Comment 2 SATHEESARAN 2015-10-15 09:43:03 UTC

(In reply to Ramesh N from comment #1)
> Looks like this was done deliberately to avoid necessary VM migrations/pause
> when glusterd stopped in a node where both gluster and virt services are
> running. Also node will not be removed from Cluster when user has run peer
> detach in gluster CLI. We have to find out a better way to handle this case.
> 
>   I feel moving the node to non-operational is not a right solution as it
> will pause/migrate the VMs. May be we can give a warning symbol in the host
> also we can raise an alert saying glusterd stopped in the node. 
> 
>   Also we have to find out a way to manage gluster peer detach in hyper
> converged use case.

Thanks for that information.

I agree to the fact that marking the converged host as non-operational, when glusterd goes down, would result in unwanted VM migrations.

Its good to have some form of representation or indication when glusterd was down.

I don't understand your statement saying - "Also we have to find out a way to manage gluster peer detach in hyper converged use case.". What is the problem around this ?

Comment 3 Red Hat Bugzilla Rules Engine 2015-12-11 02:19:38 UTC

Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 4 Sahina Bose 2016-03-31 09:36:30 UTC

We should try to restart glusterd when it's crashed via systemd, and if it still fails to restart, move host to Non-Operational

Comment 5 Ramesh N 2016-06-20 09:46:59 UTC

(In reply to SATHEESARAN from comment #2)
> (In reply to Ramesh N from comment #1)
> > Looks like this was done deliberately to avoid necessary VM migrations/pause
> > when glusterd stopped in a node where both gluster and virt services are
> > running. Also node will not be removed from Cluster when user has run peer
> > detach in gluster CLI. We have to find out a better way to handle this case.
> > 
> >   I feel moving the node to non-operational is not a right solution as it
> > will pause/migrate the VMs. May be we can give a warning symbol in the host
> > also we can raise an alert saying glusterd stopped in the node. 
> > 
> >   Also we have to find out a way to manage gluster peer detach in hyper
> > converged use case.
> 
> Thanks for that information.
> 
> I agree to the fact that marking the converged host as non-operational, when
> glusterd goes down, would result in unwanted VM migrations.
> 
> Its good to have some form of representation or indication when glusterd was
> down.
> 
> I don't understand your statement saying - "Also we have to find out a way
> to manage gluster peer detach in hyper converged use case.". What is the
> problem around this?


I was talking about, what should happen to a HC host when it is removed from gluster cluster using 'gluster peer detach'. Now we have decided to shown an alert when a host is detached from gluster cluster.

Comment 6 Sahina Bose 2016-06-20 09:54:24 UTC

Changed the title to indicate what the bug addresses =
When a gluster host is detected as Disconnected or detached, an alert is shown in the UI for hosts that have virt+gluster service is running. The host is not moved to Non-operational in such cases to prevent VM migrations.

An option is provided in the UI to restart glusterd -in case of disconnected hosts.

Comment 7 Pavel Stehlik 2016-08-22 06:03:36 UTC

Closing due to capacity reasons, if still happen, please reopen.