Bug 1189285

Summary:	[RFE] Putting a RHS node into maintenance mode currently seems to do nothing.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Harold Miller <hamiller>
Component:	rhsc	Assignee:	Ramesh N <rnachimu>
Status:	CLOSED ERRATA	QA Contact:	Triveni Rao <trao>
Severity:	medium	Docs Contact:
Priority:	high
Version:	rhgs-3.0	CC:	asrivast, byarlaga, divya, knarra, mbukatov, mkalinin, nlevinki, olim, rhs-bugs, rnachimu, sabose, sankarshan, sashinde, sasundar, sgraf
Target Milestone:	---	Keywords:	FutureFeature, ZStream
Target Release:	RHGS 3.1.2
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	rhsc-3.1.2-0.68, vdsm-4.16.30-1.1	Doc Type:	Enhancement
Doc Text:	Previously, when the Red Hat Gluster Storage node was put into the maintenance mode, it did not stop all gluster related process as expected. Moving a Red Hat Gluster Storage node to maintenance must stop all gluster related processes so that the node is ready for maintenance activity like upgrade, repair, and so on. With this fix, all gluster related processes will be stopped when a node is moved to maintenance mode and glusterd service will be restarted when the node is activated. Restarting the glusterd service will start all gluster related process like brick process, self heal, geo-replication processes, and so on.	Story Points:	---
Clone Of:
Clones:	1291173 (view as bug list)		Environment:
Last Closed:	2016-03-01 06:10:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1230247, 1277562, 1286636, 1286638, 1289092, 1294754
Bug Blocks:	1260783, 1291173

Description Harold Miller 2015-02-04 21:35:59 UTC

RFE : putting RHS nodes in maintenance has no effect on gluster cluster or gluster node. Expected behavior would be stopping gluster related services on requested node or at least disable glusterd. Ideally both.

Comment 2 Harold Miller 2015-02-11 21:03:00 UTC

A related feature would be to detect when a RHS node was in Maintenance mode (all gluster services stopped) or faulty (any required gluster* service stopped / unresponsive)

This was requested in SFDC 01357740.

We would then need to update our documentation at:
https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html-single/Installation_Guide/index.html

and in the RHS 3.x manual as well.

Comment 3 Harold Miller 2015-02-19 21:02:04 UTC

We now have several tickets open regarding RHS-C, RHS, and Maintenance Mode.

We need:

Support of 'Maintenance Mode' on RHS

RHS-C to be capable of setting a RHS node into and out of Maintenance Mode

RHS-C to be capable of detecting when a RHS node is in 'Maintenance Mode'

RHS-C/VDSM to not start RHS nodes that are flagged in 'Maintenance Mode'

Comment 6 Harold Miller 2015-08-03 14:22:43 UTC

RHGS 3.1 documentation implies that maintenance mode is implemented and not only functional, but required. (https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Console_Administration_Guide/sect-Maintaining_Hosts.html#Moving_Hosts_into_Maintenance_Mode)

Is this BZ obsolete, or is RHGS 3.1 broken/incomplete?

Comment 7 Martin Bukatovic 2015-08-03 14:45:06 UTC

(In reply to Harold Miller from comment #6)
> RHGS 3.1 documentation implies that maintenance mode is implemented and not
> only functional, but required.
> (https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/
> Console_Administration_Guide/sect-Maintaining_Hosts.
> html#Moving_Hosts_into_Maintenance_Mode)
> 
> Is this BZ obsolete, or is RHGS 3.1 broken/incomplete?

I don't think this BZ is obsolete, documentation from 3.1 release is actually
the same compared to version from 3.0 release. See and compare:

https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Console_Administration_Guide/sect-Maintaining_Hosts.html
https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Console_Administration_Guide/sect-Maintaining_Hosts.html

Comment 9 SATHEESARAN 2015-09-18 07:21:04 UTC

BZ 1230247 is the RFE to stop all gluster services in single go.
If this requirement is implemented, then its simple for RHEV workflow to implement MAINTENANCE mode for RHGS node

Comment 10 Oonkwee Lim 2015-09-18 15:11:45 UTC

Yes, but meanwhile the document need to be change to just do one "pkll" command.

That is the customer's request.

Thanks & Regards

Oonkwee
Emerging Technologies
RedHat Global Support

Comment 13 Triveni Rao 2015-12-26 19:37:43 UTC

This bug is verified with the fixedin version rhsc-3.1.2-0.69.el6.noarch, vdsm-4.16.30-1.3.el7rhgs.x86_64

Couple of tests were performed to make sure the maintenance mode is effective from RHSC.

1. created 2 clusters with 2 nodes and created few volumes. 
2. Put one of the nodes to maintenance.
3. Confirmed from background that glusterd process has stopped, no brick processes running. 
4. tried with geo-rep session between the clusters. putting master node/slave node  into maintenance mode and checking from backend that it works fine showing geo-rep session faulty. once activated status showed ok.
5. tried to upgrade the host to new build of RHGS312 by putting the node to maintenance mode, this works fine by not showing any upgrade related issues.
6. confirmed that 3 way replica works fine by putting all the nodes to maintenance and bring back nodes one after the other. 

Over all, all the gluster related processes were stopped once the node is put to maintenance mode and process were online once activated.

Version:
vdsm-4.16.30-1.3.el7rhgs.x86_64 and vdsm-4.16.30-1.3.el6rhgs.x86_64


Output:

After maintenance mode:

[root@dhcp35-215 ~]# service glusterd status
Redirecting to /bin/systemctl status  glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Sat 2015-12-26 14:41:36 EST; 29s ago
 Main PID: 31684 (code=exited, status=0/SUCCESS)

Dec 26 14:35:32 dhcp35-215.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Dec 26 14:35:34 dhcp35-215.lab.eng.blr.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.
Dec 26 14:41:36 dhcp35-215.lab.eng.blr.redhat.com systemd[1]: Stopping GlusterFS, a clustered file-system server...
Dec 26 14:41:36 dhcp35-215.lab.eng.blr.redhat.com systemd[1]: Stopped GlusterFS, a clustered file-system server.
[root@dhcp35-215 ~]# 
[root@dhcp35-215 ~]# ps aux | grep glusterfsd
root      2358  0.0  0.0 112644   960 pts/0    S+   14:42   0:00 grep --color=auto glusterfsd
[root@dhcp35-215 ~]# ps aux | grep glusterfs
root      2360  0.0  0.0 112644   952 pts/0    S+   14:42   0:00 grep --color=auto glusterfs
[root@dhcp35-215 ~]# ps aux | grep ssh
root      1385  0.0  0.0  82548  3628 ?        Ss   06:15   0:00 /usr/sbin/sshd -D
root      2304  0.2  0.1 142860  5168 ?        Ss   14:41   0:00 sshd: root@pts/0
root      2362  0.0  0.0 112648   956 pts/0    S+   14:42   0:00 grep --color=auto ssh
[root@dhcp35-215 ~]# clear
[3;J
[root@dhcp35-215 ~]# rpm -qa | grep vdsm


After activating the host/node:

[root@dhcp35-215 ~]# service glusterd status
Redirecting to /bin/systemctl status  glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2015-12-26 14:51:49 EST; 23s ago
  Process: 2512 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 2513 (glusterd)
   CGroup: /system.slice/glusterd.service
           └─2513 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

Dec 26 14:51:47 dhcp35-215.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Dec 26 14:51:49 dhcp35-215.lab.eng.blr.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.
[root@dhcp35-215 ~]#

Comment 16 errata-xmlrpc 2016-03-01 06:10:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0310.html

Comment 17 Red Hat Bugzilla 2023-09-14 02:54:23 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days