1205724 – [RFE][HC] Host in maintenance mode should stop glusterd and glusterfsd processes

Bug 1205724 - [RFE][HC] Host in maintenance mode should stop glusterd and glusterfsd processes

Summary: [RFE][HC] Host in maintenance mode should stop glusterd and glusterfsd processes

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	RFEs
Sub Component:
Version:	---
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	ovirt-3.6.2
Target Release:	3.6.2
Assignee:	Ramesh N
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Generic_Hyper_Converged_Host 1205727 Gluster-HC-2
TreeView+	depends on / blocked

Reported:	2015-03-25 14:21 UTC by Sahina Bose
Modified:	2016-03-11 07:21 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-03-11 07:21:29 UTC
oVirt Team:	Gluster
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-3.6.z+ rule-engine: planning_ack+ rule-engine: devel_ack+ rule-engine: testing_ack+

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1303539	high	CLOSED	Host in maintenance should optionally stop glusterd services	2021-02-22 00:41:40 UTC
oVirt gerrit	43725	master	MERGED	gluster: Stop gluster processes when host moves to maintenance	Never
oVirt gerrit	48995	ovirt-engine-3.5-gluster	MERGED	gluster: Stop gluster processes when host moves to maintenance	Never
oVirt gerrit	50306	ovirt-engine-3.6	MERGED	gluster: Stop gluster processes when host moves to maintenance	2015-12-24 13:30:20 UTC
oVirt gerrit	50312	ovirt-3.6	MERGED	gluster: Added VDSM verb to stop gluster related processes	Never
oVirt gerrit	51099	ovirt-engine-3.6.2	MERGED	gluster: Stop gluster processes when host moves to maintenance	2015-12-30 10:28:02 UTC

Internal Links: 1303539

Description Sahina Bose 2015-03-25 14:21:57 UTC

Description of problem:
If gluster service is enabled on the host being put into maintenance, then glusterd and brick processes on the host should be stopped. 

This is required to stop clients from accessing the data, if the host is going to be upgraded.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
NA

Comment 1 Itamar Heim 2015-03-29 08:55:27 UTC

should this be optional?
should it be blocked if another host in the brick-set is still healing?

Comment 2 Sahina Bose 2015-04-06 12:16:50 UTC

I think in such cases, putting the host to maintenance mode should be blocked.

Comment 3 Shubhendu Tripathi 2015-07-14 09:04:44 UTC

The BZs #1213291 and #1196433 would take care of not allowing other nodes to maintenance state.

Comment 4 Red Hat Bugzilla Rules Engine 2015-10-19 10:52:37 UTC

Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 5 Sahina Bose 2015-12-11 06:24:53 UTC

Retargeting as this is required for HC mode operations

Comment 6 Sandro Bonazzola 2015-12-22 13:08:36 UTC

Please don't leave bugs assigned to bugs when you take it.

Comment 7 Sahina Bose 2015-12-23 05:32:21 UTC

Moving host checks before moving to maintenance to separate bz

Comment 8 Sandro Bonazzola 2015-12-23 13:41:28 UTC

oVirt 3.6.2 RC1 has been released for testing, moving to ON_QA

Comment 9 Jillian Morgan 2016-01-31 16:28:34 UTC

> Itamar Heim 2015-03-29 04:55:27 EDT
>
>should this be optional?
>should it be blocked if another host in the brick-set is still healing?

> Shubhendu Tripathi 2015-07-14 05:04:44 EDT
>
>The BZs #1213291 and #1196433 would take care of not allowing other
>nodes to maintenance state.

I definitely agree that this should be optional, and definitely NOT the default behaviour, until the two above-noted BZs are implemented as well! As it now, 3.6.2 released with this change will have a very high likelyhood of breaking my oVirt cluster when moving a node to maintenance without performing very careful checks on the nodes first.

Even if the nodes are in a state that is "safe" to stop one of the glusterds, it will unnecessarily require a heal to be performed after re-activating the node.

Perhaps this isn't such a big deal in "big" clusters with dozens of nodes, but on a minimal 3-node cluster, taking one gluster node offline unnecessarily is a big risk.

There are many reasons to want to put the VDSM/hypervisor into maintenance, but NOT disrupt the gluster daemons on the same node!

As it works now (<= 3.6.1) I can safely move a node to maintenance, from the vdsm/hypervisor point-of-view without affecting the operational state of the gluster volumes.

I am now stuck on 3.6.1 until this change is reverted or augmented to be optional when placing a node into maintenance.

I recommend that this change be reverted, and this BZ should be set to depend on the two above-noted BZs.

Comment 11 Sahina Bose 2016-02-01 08:51:15 UTC

Will rework this to make it optional, agree with the user's concerns.

Raised bug 1303539 to track it.

Comment 12 SATHEESARAN 2016-02-25 10:29:24 UTC

Tested with RHEV 3.6.3.3 and RHGS 3.1.2 RC ( glusterfs-3.7.5.19.el7rhgs )

gluster services are not stopped forcefully but an option is provided for stopping the gluster services while moving the host to maintenance

Note You need to log in before you can comment on or make changes to this bug.