Bug 1213291 - [RFE][HC] Should not allow a gluster host to move to maintenance if quorum is not met
Summary: [RFE][HC] Should not allow a gluster host to move to maintenance if quorum is...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: RFEs
Version: ---
Hardware: x86_64
OS: Linux
urgent
medium vote
Target Milestone: ovirt-4.1.0-alpha
: ---
Assignee: Ramesh N
QA Contact: SATHEESARAN
URL:
Whiteboard:
: 1293792 (view as bug list)
Depends On:
Blocks: Generic_Hyper_Converged_Host Gluster-HC-2 1422320
TreeView+ depends on / blocked
 
Reported: 2015-04-20 09:30 UTC by Sahina Bose
Modified: 2017-03-23 01:35 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
This update introduced a check in the host maintenance flow to ensure glusterFS quorum can be maintained for all glusterFS volumes which have the 'cluster.quorum-type' option set. Similarly there is a new check to ensure that the host moving to maintenance doesn't have a glusterFS brick which is a source of volume self-healing. These checks will be performed by default when moving the host to maintenance. There is an option in the Manager to skip these checks, but this can result in bringing your system to halt. This option should be used in extreme cases.
Clone Of:
Environment:
Last Closed: 2017-02-15 14:55:24 UTC
oVirt Team: Gluster
sabose: ovirt-4.1?
ylavi: planning_ack?
sabose: devel_ack+
sasundar: testing_ack+


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 43773 master MERGED engine: check gluster params while moving Host to maintenance 2016-09-20 05:48:27 UTC
oVirt gerrit 59102 master NEW webadmin: Enable force option in host maintenance 2016-09-23 13:14:32 UTC

Description Sahina Bose 2015-04-20 09:30:54 UTC
Description of problem:

In HC mode, if hosts are moved into maintenance such that cluster quorum is not available, this will lead to VMs getting paused/ non available.

Need to ensure following checks before a gluster enabled host is moved to maintenance
1. Quorum is met even when host is moved to maintenance
2. Self-heal is not ongoing on any of the volumes

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
NA

Comment 1 Ramesh N 2015-07-29 05:10:22 UTC
We will use the following logic to determine if a host can be moved to maintenance or not.

1. If the node is not up then we can allow the host to move maintenance. This is important because when some nodes goes to non-operational then we have to move them back to maintenance to re-install them. 
2. If there is no volume running in the cluster then we can allow any node in the cluster to move to maintenance.
3. If the volume option "cluster.server-quorum-type" is not set to 'server' for any of the volume in cluster then we may not required to enforce the quorum. So we can allow any host to move to maintenance.
4. If the quorum ration of > .5 can be met then we can allow the host to move to maintenance.

Comment 2 Red Hat Bugzilla Rules Engine 2015-10-19 11:03:23 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 3 SATHEESARAN 2016-02-10 15:46:16 UTC
(In reply to Ramesh N from comment #1)
> We will use the following logic to determine if a host can be moved to
> maintenance or not.
> 
> 1. If the node is not up then we can allow the host to move maintenance.
> This is important because when some nodes goes to non-operational then we
> have to move them back to maintenance to re-install them. 
> 2. If there is no volume running in the cluster then we can allow any node
> in the cluster to move to maintenance.
> 3. If the volume option "cluster.server-quorum-type" is not set to 'server'
> for any of the volume in cluster then we may not required to enforce the
> quorum. So we can allow any host to move to maintenance.
> 4. If the quorum ration of > .5 can be met then we can allow the host to
> move to maintenance.

You have also think about client-side quorum, "quorum-type" set on any volume, then also, you can't have more than 2 nodes in maintenance

Comment 4 Sandro Bonazzola 2016-05-02 10:04:25 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 5 Yaniv Lavi 2016-05-23 13:19:05 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 6 Yaniv Lavi 2016-05-23 13:22:56 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 7 Ramesh N 2016-11-02 11:48:25 UTC
*** Bug 1293792 has been marked as a duplicate of this bug. ***

Comment 8 Sandro Bonazzola 2016-12-12 13:57:21 UTC
The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.

Comment 9 SATHEESARAN 2017-02-09 02:56:58 UTC
Tested with RHV 4.1 Beta1 ( Red Hat Virtualization Manager Version: 4.1.0.3-0.1.el7 )

1. In a cluster,with gluster+virt service enabled, and with 3 nodes.
2. Move one of the node to maintenance choosing to stop gluster services
3. Tried moving one another node to maintenance by choosing to stop gluster service. 

The action failed with propoer error message "Error while executing action: Cannot switch the following Host(s) to Maintenance mode: host3.
Gluster quorum will be lost for the following Volumes: engine"


Note You need to log in before you can comment on or make changes to this bug.