Bug 1205724 - [RFE][HC] Host in maintenance mode should stop glusterd and glusterfsd processes
Summary: [RFE][HC] Host in maintenance mode should stop glusterd and glusterfsd processes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: RFEs
Version: ---
Hardware: Unspecified
OS: Unspecified
medium
medium vote
Target Milestone: ovirt-3.6.2
: 3.6.2
Assignee: Ramesh N
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks: Generic_Hyper_Converged_Host 1205727 Gluster-HC-2
TreeView+ depends on / blocked
 
Reported: 2015-03-25 14:21 UTC by Sahina Bose
Modified: 2016-03-11 07:21 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: To help in upgrades and maintenance of gluster nodes, glusterd and related gluster services are stopped when a host is moved to Maintenance mode in engine
Clone Of:
Environment:
Last Closed: 2016-03-11 07:21:29 UTC
oVirt Team: Gluster
rule-engine: ovirt-3.6.z+
rule-engine: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 43725 master MERGED gluster: Stop gluster processes when host moves to maintenance Never
oVirt gerrit 48995 ovirt-engine-3.5-gluster MERGED gluster: Stop gluster processes when host moves to maintenance Never
oVirt gerrit 50306 ovirt-engine-3.6 MERGED gluster: Stop gluster processes when host moves to maintenance 2015-12-24 13:30:20 UTC
oVirt gerrit 50312 ovirt-3.6 MERGED gluster: Added VDSM verb to stop gluster related processes Never
oVirt gerrit 51099 ovirt-engine-3.6.2 MERGED gluster: Stop gluster processes when host moves to maintenance 2015-12-30 10:28:02 UTC
Red Hat Bugzilla 1303539 None None None Never

Internal Links: 1303539

Description Sahina Bose 2015-03-25 14:21:57 UTC
Description of problem:
If gluster service is enabled on the host being put into maintenance, then glusterd and brick processes on the host should be stopped. 

This is required to stop clients from accessing the data, if the host is going to be upgraded.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
NA

Comment 1 Itamar Heim 2015-03-29 08:55:27 UTC
should this be optional?
should it be blocked if another host in the brick-set is still healing?

Comment 2 Sahina Bose 2015-04-06 12:16:50 UTC
I think in such cases, putting the host to maintenance mode should be blocked.

Comment 3 Shubhendu Tripathi 2015-07-14 09:04:44 UTC
The BZs #1213291 and #1196433 would take care of not allowing other nodes to maintenance state.

Comment 4 Red Hat Bugzilla Rules Engine 2015-10-19 10:52:37 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 5 Sahina Bose 2015-12-11 06:24:53 UTC
Retargeting as this is required for HC mode operations

Comment 6 Sandro Bonazzola 2015-12-22 13:08:36 UTC
Please don't leave bugs assigned to bugs@ovirt.org when you take it.

Comment 7 Sahina Bose 2015-12-23 05:32:21 UTC
Moving host checks before moving to maintenance to separate bz

Comment 8 Sandro Bonazzola 2015-12-23 13:41:28 UTC
oVirt 3.6.2 RC1 has been released for testing, moving to ON_QA

Comment 9 Ian Morgan 2016-01-31 16:28:34 UTC
> Itamar Heim 2015-03-29 04:55:27 EDT
>
>should this be optional?
>should it be blocked if another host in the brick-set is still healing?

> Shubhendu Tripathi 2015-07-14 05:04:44 EDT
>
>The BZs #1213291 and #1196433 would take care of not allowing other
>nodes to maintenance state.

I definitely agree that this should be optional, and definitely NOT the default behaviour, until the two above-noted BZs are implemented as well! As it now, 3.6.2 released with this change will have a very high likelyhood of breaking my oVirt cluster when moving a node to maintenance without performing very careful checks on the nodes first.

Even if the nodes are in a state that is "safe" to stop one of the glusterds, it will unnecessarily require a heal to be performed after re-activating the node.

Perhaps this isn't such a big deal in "big" clusters with dozens of nodes, but on a minimal 3-node cluster, taking one gluster node offline unnecessarily is a big risk.

There are many reasons to want to put the VDSM/hypervisor into maintenance, but NOT disrupt the gluster daemons on the same node!

As it works now (<= 3.6.1) I can safely move a node to maintenance, from the vdsm/hypervisor point-of-view without affecting the operational state of the gluster volumes.

I am now stuck on 3.6.1 until this change is reverted or augmented to be optional when placing a node into maintenance.

I recommend that this change be reverted, and this BZ should be set to depend on the two above-noted BZs.

Comment 11 Sahina Bose 2016-02-01 08:51:15 UTC
Will rework this to make it optional, agree with the user's concerns.

Raised bug 1303539 to track it.

Comment 12 SATHEESARAN 2016-02-25 10:29:24 UTC
Tested with RHEV 3.6.3.3 and RHGS 3.1.2 RC ( glusterfs-3.7.5.19.el7rhgs )

gluster services are not stopped forcefully but an option is provided for stopping the gluster services while moving the host to maintenance


Note You need to log in before you can comment on or make changes to this bug.